Description
We have strong evidence that conversational environments lead to better learning in a transferable way. However, we need to go beyond text. As a Research Engineer, you will play a pivotal role in expanding Meta Reinforcement Learning to multimodal setups. You will help us leapfrog current industry benchmarks by extending our focus from verifiable domains to semi-verifiable, multimodal domains (e.g., Lens, Image-grounded reasoning).
This is an ecosystem play: you will leverage our advantages in autoraters and autousers to scale the creation of these conversational environments. You will be the bridge between the core conversational work and the specifics of grounding in the visual domain, moving our training infra from static data towards dynamic, multi-turn environments.
Key responsibilities
- Multimodal RL Research: Design and implement novel RL algorithms that enable multi-turn reasoning and learning in multimodal (text + vision) environments.
- Environment Scaling: Contribute to the "ecosystem" of autoraters and autousers, building the infrastructure needed to generate high-quality, semi-verifiable training environments at scale.
- Strategic Application: Apply state-of-the-art methods to solve strategic problems, specifically closing the gap between single-turn and multi-turn embeddings (retrieval-augmented reasoning).
- Experimentation & Analysis: track, interpret, and analyze complex experiments, providing scientific rigor to our training pipelines.
- Collaboration: Act as a connector between teams (Google Research, Core, GDM GenAI), helping to build shared pipelines for conversational infrastructure that serve product needs in Search, Lens, and YouTube.
What We Can Offer You
- Scientific Contribution: The opportunity to publish and contribute to the scientific community, specifically in the high-impact intersection of RL, Multimodality, and Reasoning.
- Scale & Resources: Access to world-class compute and the existing infrastructure of autoraters/autousers, allowing you to focus on innovation rather than building from scratch.
- Direct Impact: Your work will directly influence the reasoning capabilities of Google’s flagship models (Gemini), moving the needle on how models learn and interact with the world.
- Collaborative Culture: Work alongside world-leading experts in RL and Generative AI in a supportive, growth-oriented environment.