Performance Engineer

New Today

About the Team
We bring OpenAI's technology to the world through products like ChatGPT and the OpenAI API. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth. About the Role
OpenAI is looking for an experienced Performance Engineer to help us scale the performance, reliability, and efficiency of our systems. In this role, you'll apply deep technical expertise to optimize infrastructure and application-level performance across mission-critical products like ChatGPT and our developer API. You’ll work cross-functionally with teams building core services, training models, and developing real-time user experiences to push our latency, throughput, and cost-efficiency to the next level. We are looking for engineers who thrive in ambiguous environments, value deep systems understanding, and are motivated by delivering measurable impact. This is a highly technical, individual contributor role focused on root-cause analysis, profiling, instrumentation, and architecture-level performance improvements across our stack. In this role, you will:
Analyze and optimize performance across application, middleware, runtime, and infrastructure layers—networking, storage, Python runtime, GPU utilization, and beyond.
Develop tooling and metrics that provide deep observability into system performance.
Collaborate closely with infra, platform, training, and product teams to identify key performance goals and drive systemic improvements.
Influence architecture and design decisions to prioritize latency, throughput, and efficiency at scale.
Lead investigations into high-impact performance regressions or scalability issues in production.
Drive performance testing strategies and help define SLAs/SLOs around latency and throughput for critical systems.
You might thrive in this role if you:
Have 7+ years of experience in software engineering with a strong track record in performance or reliability of high-scale distributed systems.
Are deeply comfortable with performance profiling tools and tracing systems.
Have experience optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Python/Golang internals, GPU utilization).
Have a strong understanding of OS internals, scheduling, memory management, and IO patterns.
Have contributed to observability, benchmarking, or performance-focused infrastructure at scale.
Have demonstrated success navigating ambiguity and aligning stakeholders around performance goals.
Value simplicity, rigor, and collaboration when solving complex systems problems.
Location:
San Francisco

We found some similar jobs based on your search