Codex is OpenAI’s first-party developer product focused on agentic software engineering. We’re building tools that help engineers design, write, test, and ship code faster—safely and at scale. We partner tightly with research and product to translate model advances into tangible developer productivity.
As a Data Scientist on Codex, you will measure and accelerate product-market fit for AI developer tools. You’ll define what “developer productivity” means for our product, run experiments on new coding models and UX, and pinpoint where the model helps or hurts across languages and tasks. Your insights will directly shape how an entire industry builds software.
This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.
Embed with the Codex product team to discover opportunities that improve developer outcomes and growth
Design and interpret A/B tests and staged rollouts of new coding models and product features
Define and operationalize metrics such as suggestion acceptance, edit distance, compile/test pass rates, task completion, latency, and session productivity
Build dashboards and analyses that help the team self-serve answers to product questions (by language, framework, repo size, task type)
Diagnose failure modes and partner with Research on targeted improvements (model quality signals, user feedback, evals)
5+ years in a quantitative role at a developer-facing or high-growth product
Fluency in SQL and Python; comfort with experiment design and causal inference
Experience defining product metrics tied to user value
Ability to communicate clearly with PM, Eng, and Design—and to influence product direction
Strong programming background; ability to prototype, run simulations, and reason about code quality
Familiarity with IDE/extension telemetry or developer tooling analytics
Prior experience with NLP/LLMs, code models, or evaluations for generative coding