Google DeepMind · San Francisco/New York City

Research Scientist, Generative Worlds

11/1/2025

Description

Key responsibilities: Conduct research to build generative multimodal models of the 3D world. Solve essential problems to train world models at massive scale: build and train large-scale systems for data annotation, curate and annotate training datasets, build and maintain large model training infrastructure, develop scaling ladders and training recipes, develop metrics for spatial intelligence, enable real-time interactive experiences, study the integration of spatial modalities with multimodal language models, and of course: actually train massive-scale models.

Areas of focus:

  • 3D computer vision, spatial annotation systems
  • Spatial representations
  • Training large-scale transformers
  • Generative pixel and latent models
  • Infrastructure for large-scale data pipelines and annotation.
  • Quantitative evals for spatial accuracy and intelligence.
  • Model scaling, efficiency, distillation, training infrastructure

Qualifications

  • MSc or PhD in computer science or machine learning, or equivalent industry experience.
  • Experience with large-scale transformer models and/or large-scale data pipelines.
  • Track record of releases, publications, and/or open source projects relating to video generation, world models, multimodal language models, or transformer architectures.
  • Exceptional engineering skills in Python and deep learning frameworks (e.g., Jax, TensorFlow, PyTorch), with a track record of building high-quality research prototypes and systems.
  • Demonstrated experience in large-scale training of multimodal generative models.
  • Experience building training codebases for large-scale video or multimodal transformers.
  • Expertise optimizing efficiency of distributed training systems and/or inference systems.
  • Strong background in 3D representations or 3D computer vision
  • Strong publication record at top-tier machine learning, computer vision, and graphics conferences (e.g., NeurIPS, ICLR, ICML, SIGGRAPH, CVPR, ICCV).
  • A keen eye for visual aesthetics and detail, coupled with a passion for creating high-quality, visually compelling generative content.

Application

View listing at origin and apply!