Software Engineer, ML Performance

New Today

About the Team Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprises and developers alike to use and access our state-of-the-art AI models, allowing them to do things that they’ve never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference. About the Role We're looking for an experienced performance engineer to join the Inference Engineering team. This role is focused on improving the efficiency, speed, and reliability of our inference systems. You’ll work across the stack to identify bottlenecks, implement optimizations, and help us get the most out of our hardware. Responsibilities Profile system performance and identify issues across GPU, CPU, memory, and networking layers
Work with engineers and researchers to improve throughput, latency, and reliability in our inference stack
Build tooling and infrastructure to make performance issues easier to detect and debug Drive efforts to optimize core components, including kernel usage, data movement, and scheduling
Own performance investigations end-to-end, from debugging to implementation
Qualifications 5+ years of experience in performance engineering or a related role
Strong background in systems-level debugging, profiling, and optimization
Proficiency in Python and/or C++
Experience working with performance-critical systems (e.g., games, VFX, HFT, distributed systems)
Familiarity with GPUs and performance tooling (e.g., CUDA, Nsight, perf, flamegraphs) is a plus
Comfortable working cross-functionally and independently identifying high-impact work
Location:
San Francisco

We found some similar jobs based on your search