Mistral AI · Paris

AI Engineer, Product

12/23/2025

Description


Embedded directly in a product team as search, chat, documents, or audio, you'll improve AI-powered features through rigorous evaluation, prompt and orchestration design, and rapid experimentation. You'll own your domain's AI quality end-to-end: define what "good" looks like, measure it, run experiments, and ship what works. Work with Science to deliver measurable improvements to quality, latency, safety, and reliability.



• Design and run evaluations for your product area: reference tests, heuristics, model-graded checks tailored to search relevance, chat quality, document understanding, or audio performance.
• Define and track metrics that matter: task success, helpfulness, hallucination proxies, safety flags, latency, cost.
• Own prompt and orchestration design: write, test, and iterate on prompts and system prompts as a core part of your work.
• Run A/B tests on prompts, models, and configurations; analyze results; make rollout or rollback decisions from data.
• Set up observability for LLM calls: structured logging, tracing, dashboards, alerts.
• Operate model releases: canary and shadow traffic, sign-offs, SLO-based rollback criteria, regression detection.
• Improve core behaviors in your product area, whether that's memory policies, intent classification, routing, tool-call reliability, or retrieval quality.
• Create templates and documentation so other teams can author evals and ship safely.
• Partner with Science to diagnose regressions and lead post-mortems.

Qualifications



• 3-4 years of experience; backgrounds that fit well include ML engineers moving closer to product, or software engineers with real AI/ML production experience.
• Strong TypeScript or Python skills - we have both tracks depending on team fit.
• Production LLM experience: prompts, tool/function calling, system prompts.
• Hands-on with evals and A/B testing; you can design metrics, not just run them.
• Comfortable implementing directly in product code, not only notebooks.
• Observability experience: logging, tracing, dashboards, alerting.
• Product mindset: form hypotheses, run experiments, interpret results, ship.
• Clear communication, autonomous, and oriented toward production impact over experimentation for its own sake.

It would be ideal if you also have:
• Safety systems experience: moderation, PII handling/redaction, guardrails.
• Release operations: canary/shadowing, automated rollbacks, experiment platforms.
• Prior work on search ranking, chat systems, document AI, or audio ML features.

Application

View listing at origin and apply!