OpenAI · San Francisco · Hybrid

Software Engineer, GPU Inference

4/21/2025

Description

The Sora team is pioneering multimodal capabilities for OpenAI’s foundation models. We’re a hybrid research and product team focused on integrating multimodal functionalities into our AI products, ensuring they are reliable, user-friendly, and aligned with our mission of broad societal benefit.

About the Role

We’re looking for a GPU Inference Engineer to contribute to improvements in model serving efficiency for Sora. This is a high-impact role where you’ll drive initiatives to optimize inference performance and scalability. You’ll also be engaged in model design, to help assist our researchers in developing inference-friendly models. 

This role is critical to scaling the team’s broader goals - it will directly enable leadership to focus on higher-leverage initiatives by building a stronger technical foundation.

In this role you will:

  • Perform engineering efforts focused on improving model serving, inference performance, and system efficiency

  • Drive optimizations from a kernel and data movement perspective to improve system throughput and reliability

  • Partner closely with research and product teams to ensure our models perform effectively at scale

  • Design, build, and improve critical serving infrastructure to support Sora’s growth and reliability needs

Qualifications

  • Have deep expertise in model performance optimization, particularly at the inference layer

  • Have a strong background in kernel-level systems, data movement, and low-level performance tuning

  • Are excited about scaling high-performing AI systems that serve real-world, multimodal workloads

  • Can navigate ambiguity, set technical direction, and drive complex initiatives to completion

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

Application

View listing at origin and apply!