Description
As a Research Engineer on the Model Performance team, you will help solve one of our greatest challenges: systematically understanding and monitoring model quality in real-time. This role blends research and engineering responsibilities, requiring you to train production models, develop robust monitoring systems, and create novel evaluation methodologies. 
Note: For this role, we conduct all interviews in Python.
Representative Projects
- Build comprehensive training observability systems - Design and implement monitoring infrastructure to keep an eye on how model behaviors evolve throughout training.
- Develop next-generation evaluation frameworks - Move beyond traditional benchmarks to create evaluations that capture real-world utility. 
- Create automated quality assessment pipelines - Build custom classifiers to continuously monitor RL transcripts for complex issues
- Bridge research and production - Partner with research teams to translate cutting-edge evaluation techniques into production-ready systems, and work with engineering teams to ensure our monitoring infrastructure scales with increasingly complex training workflows.