Mistral AI · Paris · Hybrid

AI Engineer, Product

12/23/2025

Description

Role summary
Embedded directly in a product team as search, chat, documents, or audio, you'll improve AI-powered features through rigorous evaluation, prompt and orchestration design, and rapid experimentation. You'll own your domain's AI quality end-to-end: define what "good" looks like, measure it, run experiments, and ship what works. Work with Science to deliver measurable improvements to quality, latency, safety, and reliability.
  • Design and run evaluations for your product area: reference tests, heuristics, model-graded checks tailored to search relevance, chat quality, document understanding, or audio performance.
  • Define and track metrics that matter: task success, helpfulness, hallucination proxies, safety flags, latency, cost.
  • Own prompt and orchestration design: write, test, and iterate on prompts and system prompts as a core part of your work.
  • Run A/B tests on prompts, models, and configurations; analyze results; make rollout or rollback decisions from data.
  • Set up observability for LLM calls: structured logging, tracing, dashboards, alerts.
  • Operate model releases: canary and shadow traffic, sign-offs, SLO-based rollback criteria, regression detection.
  • Improve core behaviors in your product area, whether that's memory policies, intent classification, routing, tool-call reliability, or retrieval quality.
  • Create templates and documentation so other teams can author evals and ship safely.
  • Partner with Science to diagnose regressions and lead post-mortems.

Qualifications

  • 3-4 years of experience; backgrounds that fit well include ML engineers moving closer to product, or software engineers with real AI/ML production experience.
  • Strong TypeScript or Python skills - we have both tracks depending on team fit.
  • Production LLM experience: prompts, tool/function calling, system prompts.
  • Hands-on with evals and A/B testing; you can design metrics, not just run them.
  • Comfortable implementing directly in product code, not only notebooks.
  • Observability experience: logging, tracing, dashboards, alerting.
  • Product mindset: form hypotheses, run experiments, interpret results, ship.
  • Clear communication, autonomous, and oriented toward production impact over experimentation for its own sake.
  • Safety systems experience: moderation, PII handling/redaction, guardrails.
  • Release operations: canary/shadowing, automated rollbacks, experiment platforms.
  • Prior work on search ranking, chat systems, document AI, or audio ML features.

Application

View listing at origin and apply!

Fast-track your ML job hunt :

Be the first to hear about new sota jobs + exclusive salary research + career cheatsheets.