Description
Design and develop machine learning and LLM-based solutions for ML model and system evaluation use cases such as:
* Automatic large scale data generation
* Automatic UI and Non UI test evaluation
* Run evaluation jobs at scale
* Build and optimize LLM judges
* Intelligent log summarization and anomaly detection
Fine-tune or prompt-engineer foundation models (e.g., Apple, GPT, Claude) for Evaluation-specific applications.
Collaborate with QA teams to integrate models into testing frameworks.
Continuously evaluate and improve model performance through A/B testing, human feedback loops, and retraining.
Monitor advances in LLMs and NLP and propose innovative applications within the ML evaluation domain.