The Strategic Deployment team makes frontier models more capable, reliable, and aligned to transform high-impact domains. On one hand, this involves deploying models in real-world, high-stakes settings to drive AI-driven transformation and elicit insights—training data, evaluation methods, and techniques—to shape our frontier model development. On the other hand, we leverage these learnings to build the science and engineering of impactful frontier model deployment.
Put differently, we want to understand: if AGI is viewed as AI being able to majorly transform our economy, how close are we to AGI? What’s still missing? How do we bridge these gaps?
About the Role
As a Research Engineer/Research Scientist on the Strategic Deployment team, you’ll tackle fundamental research questions and AI engineering challenges grounded in real-world deployment. Your work will build the science and engineering of reliable customization of frontier models and develop understanding and evaluations that fuel both strategically-important AI deployments and guide OpenAI’s frontier model program.
This role is based in San Francisco, CA. We follow a hybrid model (3 days/week in-office) and offer relocation support.
In this role, you will:
Conduct research on real-world-informed model generalization, robustness, and steerability.
Design challenging evaluations that capture real-world utility and reveal frontier model capability gaps.
Use real deployments as a way to generate learnings to guide OpenAI’s frontier model program.
Collaborate with other researchers, infrastructure teams, and in some cases, domain experts and/or partners in strategic industries to surface major impact opportunities.
Have a research background in ML, deep learning, or related areas (e.g., RL, evaluation science, robustness).
Have strong engineering skills and are comfortable diving into a large ML codebase to debug and improve it.
Are interested in foundational research informed by real-world deployment—not just paper benchmarks.
Are excited by ambiguous, open-ended problem spaces with high impact and stakes.
Enjoy building tools, datasets, or infrastructure to elicit deeper insight.
Want to shape how frontier models evolve toward AGI.