Mistral AI · Paris

Applied AI, Evaluation Engineer

1/21/2026

Description

 
- Design and implement comprehensive evaluation frameworks to measure LLM capabilities across diverse customer use cases, including text generation, reasoning, code, and domain-specific applications
- Build scalable evaluation infrastructure and pipelines that enable rapid, reproducible assessment of model performance
- Develop novel evaluation methodologies to assess emerging capabilities or verticalized use cases (cybersecurity, finance, healthcare, etc.) and enable the Solutions (Deployment Strategist and Applied AI) on these topics.
- Create custom evaluation suites tailored to enterprise customers' specific needs, working closely with them to understand their requirements and success criteria
- Collaborate with research teams to translate evaluation insights into model improvements and training decisions
- Partner with product teams to continuously improve our evaluation tooling based on customer feedback
 
How We Work in Applied AI
 
- We care about people and outputs.
- What matters is what you ship, not the time you spend on it
- Bureaucracy is where urgency goes to vanish. You talk to whoever you need to talk to. The best idea wins, whether it comes from a principal engineer or someone in their first week.
- Always ask why. The best solutions come from deep understanding, not from copying what worked before
- We say what we mean. Feedback is direct, timely, and given because we care.
- No politics. Low ego, high standards.
- We embrace an unstructured environment and find joy in it.
 

Qualifications

 
- You are fluent in English
- 3+ years of experience in ML evaluation, benchmarking for LLM or agentic systems
- You have proven experience in AI or machine learning product implementation with APIs, back-end
- You have deep understanding of concepts and algorithms underlying machine learning and LLMs
- You have strong technical coding skills in Python
- You hold strong communication skills with an ability to explain complex technical concepts in simple terms with technical and non-technical audiences
 
Ideally you have:
 
- Contributions to open-source evaluation frameworks (e.g., LM Eval Harness, OpenAI Evals) or published research on LLM evaluation
- Experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect or Technical Product Manager
- Experience with ML frameworks (PyTorch, HuggingFace Transformers)
 

Benefits

 
🏝️ PTO: The CDI contract will be a "Forfait 218 jours", corresponding to 25 days of holidays and on average 8 to 10 days of RTT days, and complete autonomy on working hours
⚕️ Health : Full health insurance coverage for you and your family
🚗 Transportation : We offer a €600 annual mobility allowance. This package covers 50% of your public transportation costs and includes the Sustainable Mobility Allowance (FMD), encouraging eco-friendly travel options such as cycling or carpooling.
🥕 Food : Swile meal vouchers with 10,83€ per worked day, incl 60% offered by company
🏀 Sport : Gymlib - sponsorship by Mistral of a significant part of the monthly fee (depending on the program you chose)
🐤 Parental policy : 4 additional weeks for parents on top of what is offered by the French state.
 
By applying, you agree to our Applicant Privacy Policy.

Application

View listing at origin and apply!

Fast-track your ML job hunt :

Be the first to hear about new sota jobs + exclusive salary research + career cheatsheets.