Google DeepMind · London

Research Scientist, Science of Post-Training and Reinforcement Learning

3/4/2026

Description

You will work closely with Ian Osband and the team on research around post-training for agents and LLMs, including practical RL methods and evaluation. This is not a theory-only role; you should expect to implement code, run experiments, and own results end-to-end. Success in this role is defined by whether the team learns faster and whether the work produced is crisp, honest, and high-quality.

Key Responsibilities

  • Propose and test research hypotheses in post-training and RL for agents/LLMs.

  • Implement algorithm ideas and run end-to-end experiments, including setup, execution, analysis, and iteration.

  • Design evaluations and ablations that answer real questions and change minds.

  • Analyze results carefully,  including debugging and failure analysis.

  • Communicate clearly through plots, writeups, and paper-ready narratives and figures.

  • Collaborate closely with engineering and research partners to keep the team aligned on findings and strategy.

  • Contribute to a culture of first-principles thinking, high standards, and direct, constructive feedback.

Qualifications

  • A research track record in ML/RL, demonstrated through publications or high-quality projects.

  • Strong implementation ability and comfort working in research codebases.

  • Evidence of owning experiments end-to-end, including analysis and interpretation.

  • Strong communication skills and a bias toward clarity and honesty regarding results.

  • High agency and drive: You push projects forward, prioritize effectively, and take initiative.

  • PhD in ML preferred, or equivalent practical experience.

  • Experience with RL for sequence models, post-training, preference-based learning, or agentic systems.

  • Experience with modern research stacks (e.g., JAX/Flax or PyTorch) and scaling experiments.

  • Strong experimental taste: Good judgment regarding baselines, ablations, and what is worth testing.

  • Comfort with scaling, evaluation methodologies, and diagnosing complex failure modes.

  • A focus on craft: You care about doing excellent work while maintaining a high velocity.

Application

View listing at origin and apply!

Fast-track your ML job hunt :

Be the first to hear about new sota jobs + exclusive salary research + career cheatsheets.