You will be part of the REMY (Research & dEvelopment in Multimodal technologY) team in the Media Understanding organization, at Google DeepMind. In this role, you will have the opportunity to push forward state-of-the-art research in multimodal AI representation models, in the context of recent advancements in multimodal foundation models generally. You'll be at the forefront of developing models that power Google products used by billions of people worldwide. Your work will directly impact how these products understand and interact with images, text and video. This is a unique opportunity to shape the future of multimodal AI and its applications in a dynamic and impactful environment.
We are a team of research/software engineers, research scientists, and machine learning experts, working together to enable superhuman understanding of the multimodal world.
You'll be developing the next SOTA models for multimodal understanding. Your work will include researching new modeling techniques, implementing research ideas, running experiments to evaluate improvements, and identifying new opportunities.
As a member of the Media Understanding team, you will be responsible for conducting fundamental and applied research in multimodal AI (computer vision, language understanding, machine learning, and related areas). Your job responsibilities will include:
The US base salary range for this full-time position is between $141,000 - $202,000 + bonus + equity + benefits. Your recruiter can share more about the specific salary range for your targeted location during the hiring process.