
Fast-track your ML job hunt :
You will work closely with Ian Osband and the team on research around post-training for agents and LLMs, including practical RL methods and evaluation. This is not a theory-only role; you should expect to implement code, run experiments, and own results end-to-end. Success in this role is defined by whether the team learns faster and whether the work produced is crisp, honest, and high-quality.
Key Responsibilities
Propose and test research hypotheses in post-training and RL for agents/LLMs.
Implement algorithm ideas and run end-to-end experiments, including setup, execution, analysis, and iteration.
Design evaluations and ablations that answer real questions and change minds.
Analyze results carefully, including debugging and failure analysis.
Communicate clearly through plots, writeups, and paper-ready narratives and figures.
Collaborate closely with engineering and research partners to keep the team aligned on findings and strategy.
Contribute to a culture of first-principles thinking, high standards, and direct, constructive feedback.
A research track record in ML/RL, demonstrated through publications or high-quality projects.
Strong implementation ability and comfort working in research codebases.
Evidence of owning experiments end-to-end, including analysis and interpretation.
Strong communication skills and a bias toward clarity and honesty regarding results.
High agency and drive: You push projects forward, prioritize effectively, and take initiative.
PhD in ML preferred, or equivalent practical experience.
Experience with RL for sequence models, post-training, preference-based learning, or agentic systems.
Experience with modern research stacks (e.g., JAX/Flax or PyTorch) and scaling experiments.
Strong experimental taste: Good judgment regarding baselines, ablations, and what is worth testing.
Comfort with scaling, evaluation methodologies, and diagnosing complex failure modes.
A focus on craft: You care about doing excellent work while maintaining a high velocity.
Fast-track your ML job hunt :