OpenAI’s Platform team powers how millions of developers and enterprises build with our models. We provide APIs and agentic solutions used by global startups and fortune 500s. We work closely with product, engineering, design, and go-to-market to build a world-class platform that pushes the frontier of AI capabilities.
As a Data Scientist on the Platform team, you will drive a data-driven culture for OpenAI’s API and B2B solutions. You’ll define the metrics that matter for developer success and enterprise value, measure the impact of new models and features, and partner with PMs and engineers to improve model quality, reliability, latency, and cost. Your work will shape how thousands of products adopt agentic AI.
This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.
Embed with the Platform product team as a trusted partner, uncovering ways to improve developer experience, reliability, and usage growth
Define north-star metrics across the developer funnel (activation, retention, growth), as well as latency/cost guardrails for new features and models
Design and interpret A/B tests and controlled rollouts (e.g., new model versions, pricing/limits, new API features, new B2B products)
Build source-of-truth dashboards and self-serve data tools for product, engineering, and go-to-market teams
Translate product learnings into actionable feedback for Research (e.g., failure modes, eval gaps, model response quality)
5+ years in a quantitative role in ambiguous, high-growth environments (platforms, APIs, or B2B products a plus)
Depth in SQL and Python, with a track record proposing, designing, and running rigorous experiments
Experience defining and operationalizing metrics from scratch (including reliability/latency/cost and safety)
Strong cross-functional communication with PMs, engineers, and executives
Strategic instincts beyond p-values—clear thinking about tradeoffs and business impact
Background in developer platforms, observability, or usage-based pricing/quotas
Experience connecting offline evals to online product impact
Prior work in NLP/LLMs or agentic systems, or in enterprise analytics for B2B products