We are seeking an AI Scientist, Safety to evaluate, enhance, and build safety mechanisms for our large language models (LLMs). This role involves identifying and addressing potential risks, biases, and misuses of LLMs, ensuring that our AI systems are ethical, fair, and beneficial to society. You will work to monitor models, prevent misuse, and ensure user well-being, applying your technical skills to uphold principles of safety, transparency, and oversight.
Location : Paris or London
Adversarial & Fairness Testing
• Design and execute adversarial attacks to uncover vulnerabilities in LLMs.
• Evaluate potential risks and harms associated with LLM outputs.
• Assess LLMs for biases and unfairness in their responses, and develop strategies to mitigate these issues.
Tools & Monitoring
• Develop monitoring systems (eg. moderation tools) to detect unwanted behaviors in Mistral’s products.
• Build robust and reliable multi-layered defenses for real-time improvement of safety mechanisms that work at scale.
• Investigate and respond to incidents involving LLM misuse or harmful outputs, and develop post-incident recommendations.
• Analyze user reports of inappropriate content or accounts.
• Contribute to the development of AI ethics policies and guidelines that govern the responsible use of LLMs.
Safety Fine Tuning
• Work on safety tuning to improve robustness of models.
• Collaborate with the AI development team to create and implement safety measures, such as content filters, moderation tools, and model fine-tuning techniques.
• Keep up-to-date with the latest research and trends in AI safety, LLMs, and responsible AI, and continuously improve our safety practices.
• Build robust and reliable multi-layered defenses for real-time improvement of safety mechanisms that work at scale