Description
As an engineer in this role, you will be primarily focused on building performance infrastructure to present high level views of ML inference behavior which are created by gathering lower level data from execution delegates and relating the data to the high level model. You’ll work with models created by the most popular ML frameworks (PyTorch, JAX, MLX, etc) and will analyze the execution to ensure the stack achieves full machine performance on Apple Silicon. The role also has exposure to building higher level APIs and toolings to enable developers to visualize, diagnose, and debug correctness and performance issues while onboarding models to on-device deployment.
We are building the first end-to-end developer experience for ML development that, by taking advantage of Apple’s vertical integration, allows developers to iterate on model authoring, optimization, transformation, execution, debugging, profiling and analysis. Providing actionable feedback to ML developers w.r.t. the details of their model’s inference behaviors is crucial to achieving full machine performance.
The role requires understanding of ML architectures, compilers, runtimes, system performance, and system software engineering.
Key responsibilities:
* Build production-critical system software for tracking low level details of executing ML models on Apple Silicon and then associating this with ops from high level models.
* Optimize model execution for various system goals such as performance, memory, energy efficiency, etc.
* Contribute to maintaining the health and performance of the ML benchmarking service, including debugging failures and addressing user questions / requests.