AI Infrastructure Engineer - PlayerZero

New Today

A stealth-stage AI infrastructure company is building a self-healing system for software that automates defect resolution and development. The platform is used by engineering and support teams to:Autonomously debug problems in production softwareFix issues directly in the codebasePrevent recurring issues through intelligent root-cause automationThe company is backed by top-tier investors such as Foundation Capital, WndrCo, and Green Bay Ventures, as well as prominent operators including Matei Zaharia, Drew Houston, Dylan Field, Guillermo Rauch, and others.We believe that as software development accelerates, the burden of maintaining quality and reliability shifts heavily onto engineering and support teams. This challenge creates a rare opportunity to reimagine how software is supported and sustained—with AI-powered systems that respond autonomously.About the RoleWe’re looking for an experienced backend/infrastructure engineer who thrives at the intersection of systems and AI — and who loves turning research prototypes into rock-solid production services. You’ll design and scale the core backend that powers our AI inference stack — from ingestion pipelines and feature stores to GPU orchestration and vector search.If you care deeply about performance, correctness, observability, and fast iteration, you’ll fit right in.What You’ll DoOwn mission-critical services end-to-end — from architecture and design reviews to deployment, observability, and service-level objectives.Scale LLM-driven systems: build RAG pipelines, vector indexes, and evaluation frameworks handling billions of events per day.Design data-heavy backends: streaming ETL, columnar storage, time-series analytics — all fueling the self-healing loop.Optimize for cost and latency across compute types (CPUs, GPUs, serverless); profile hot paths and squeeze out milliseconds.Drive reliability: implement automated testing, chaos engineering, and progressive rollout strategies for new models.Work cross-functionally with ML researchers, product engineers, and real customers to build infrastructure that actually matters.You Might Thrive in This Role If You:Have 2–5+ years of experience building scalable backend or infra systems in production environmentsBring a builder mindset — you like owning projects end-to-end and thinking deeply about data, scale, and maintainabilityHave transitioned ML or data-heavy prototypes to production, balancing speed and robustnessAre comfortable with data engineering workflows: parsing, transforming, indexing, and querying structured or unstructured dataHave some exposure to search infrastructure or LLM-backed systems (e.g., document retrieval, RAG, semantic search)Bonus PointsExperience with vector databases (e.g., pgvector, Pinecone, Weaviate) or inverted-index search (e.g., Elasticsearch, Lucene)Hands-on with GPU orchestration (Kubernetes, Ray, KServe) or model-parallel inference tuningFamiliarity with Go / Rust (primary stack), with some TypeScript for light full-stack tasksDeep knowledge of observability tooling (OpenTelemetry, Grafana, Datadog) and profiling distributed systemsContributions to open-source ML or systems infrastructure projectsLet me know if you’d like a version optimized for careers pages, job boards, or stealth pitch decks. #J-18808-Ljbffr
Location:
San Francisco, CA, United States