Anthropic · San Francisco/New York City · Hybrid

Machine Learning Engineer, Safeguards

10.10.2025

Description

We are looking for ML engineers to help build safety and oversight mechanisms for our AI systems. As a Safeguards Machine Learning Engineer, you will work to train models which detect harmful behaviors and help ensure user well-being. You will apply your technical skills to uphold our principles of safety, transparency, and oversight while enforcing our terms of service and acceptable use policies.

Responsibilities:

  • Build machine learning models to detect unwanted or anomalous behaviors from users and API partners, and integrate them into our production system
  • Improve our automated detection and enforcement systems as needed
  • Analyze user reports of inappropriate accounts and build machine learning models to detect similar instances proactively
  • Surface abuse patterns to our research teams to harden models at the training stage

Qualifications

  • Have 4+ years of experience in a research/ML engineering or an applied research scientist position, preferably with a focus on AI safety.
  • Have proficiency in Python, LLMs, SQL and data analysis/data mining tools.
  • Have proficiency in building safe AI/ML systems, such as behavioral classifiers or anomaly detection.
  • Have strong communication skills and ability to explain complex technical concepts to non-technical stakeholders.
  • Care about the societal impacts and long-term implications of your work.
  • Machine learning frameworks like Scikit-Learn, TensorFlow, or PyTorch
  • High-performance, large-scale ML systems
  • Language modeling with transformers
  • Reinforcement learning
  • Large-scale ETL

Benefits

$315,000 - $425,000 USD

Application

View listing at origin and apply!