Description
We're hiring research engineers to define and execute the Labs research agenda. You'll scope your own projects, run experiments end-to-end, and decide when an idea is ready to hand off to a production team — or when to kill it and move on. The team is small and being built deliberately around a roughly 3:1 mix of researchers to software engineers, so each person has substantial latitude over what they work on and high leverage on the team's direction.
Responsibilities:
- Lead and contribute to research projects investigating new methods for detecting misuse of Claude, identifying malicious organizations and accounts, strengthening model safeguards, and other safety needs.
- Design and run offline analyses over model usage data to surface abuse patterns, build classifiers and detection systems, and evaluate their effectiveness.
- Develop and iterate on prototypes that could eventually feed signals into the real-time safeguards path, partnering with engineers on tech transfer.
- Contribute to a broader research portfolio investigating methods for detecting abusive behavior in chat-based or agentive workflows, and for training the model to robustly refrain from dangerous responses or behaviors without over-refusing.
- Build evaluations and methodologies for measuring whether safeguards actually work, including in agentic settings.
- Write up findings clearly so they inform decisions across Trust & Safety, research, and product teams.