Software Engineer, ML Performance, San Francisco

Software Engineer, ML Performance

New Today

About the Team Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprises and developers alike to use and access our state-of-the-art AI models, allowing them to do things that they’ve never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference. About the Role We're looking for an experienced performance engineer to join the Inference Engineering team. This role is focused on improving the efficiency, speed, and reliability of our inference systems. You’ll work across the stack to identify bottlenecks, implement optimizations, and help us get the most out of our hardware. Responsibilities Profile system performance and identify issues across GPU, CPU, memory, and networking layers

Work with engineers and researchers to improve throughput, latency, and reliability in our inference stack

Build tooling and infrastructure to make performance issues easier to detect and debug Drive efforts to optimize core components, including kernel usage, data movement, and scheduling

Own performance investigations end-to-end, from debugging to implementation

Qualifications 5+ years of experience in performance engineering or a related role

Strong background in systems-level debugging, profiling, and optimization

Proficiency in Python and/or C++

Experience working with performance-critical systems (e.g., games, VFX, HFT, distributed systems)

Familiarity with GPUs and performance tooling (e.g., CUDA, Nsight, perf, flamegraphs) is a plus

Comfortable working cross-functionally and independently identifying high-impact work

Apply

Location:: San Francisco

Start a New Search