Related jobs

Google DeepMind›ML Engineer›

ML Engineer
Mountain View

›

Google DeepMind · Mountain View

Research Engineer (or Scientist), Speech and Language

11/27/2025

Description

We are looking for a top-notch Research Scientist or Research Engineer to join a fast-paced audio and language team that is fundamental to Gemini audio, music / speech generation, representation learning, and diffusion theory projects. We're looking for someone hungry to dive into both theory and code: someone who will drive independent research initiatives, work with teams on large scale AI, and develop solutions to fundamental questions in machine learning and AI.

Key responsibilities:

Design, rapidly implement in code, and rigorously evaluate cutting-edge deep learning algorithms and data curation for multimodal generative AI, with a particular emphasis on audio and video synthesis.
Report and present research findings and developments clearly and efficiently both internally and externally, verbally and in writing.
Thriving under uncertainty, driving both team collaborations to meet ambitious research goals, as well as significant individual contributions.

Qualifications

MS or PhD in Computer Science, Artificial Intelligence, Machine Learning, Computer Vision, Speech Processing, or equivalent practical experience.
Proven experience in deep learning research and development, particularly in generative AI and related to video and audio synthesis. This includes diffusion models and autoregressive generative models.
Exceptional engineering skills in Python and deep learning frameworks (e.g., JAX, TensorFlow, PyTorch), with a track record of building high-quality research prototypes and systems. Self-motivated to pick up technologies to adapt and move quickly.
Strong publication record at top-tier machine learning, computer vision, and graphics conferences (e.g., NeurIPS, ICLR, ICML, SIGGRAPH, CVPR, ICCV).

Knowledge of probabilistic machine learning and generative modeling (e.g. Diffusion, autoregressive models, GANs, flows, hierarchical VAEs, DDPMs).
Demonstrated experience in large-scale training of multimodal generative models.
Sequence processing experience with TensorFlow, PyTorch, or JAX.
Bonus: knowledge of speech processing and language understanding, in particular text-to-speech synthesis and prosody modeling.

Application

View listing at origin and apply!

Related jobs

Google DeepMind›ML Engineer›

ML Engineer
Mountain View

›

About

Related jobs

Research Engineer (or Scientist), Speech and Language

Description

Key responsibilities:

Qualifications

Application

Related jobs