Software Engineer - Observability

New Today

About xAIxAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.About the TeamThe Observability team builds and operates the core infrastructure that enables engineers to monitor, debug, and optimize the performance and reliability of their systems. We handle telemetry at massive scale - billions of time series and petabytes of logs - with strict performance and availability requirements.About the RoleYou will be part of the small, high-impact team responsible for building and maintaining X's observability platform. You'll own critical systems that power metrics, logs, tracing, and alerting enabling engineering teams to operate services at scale, identify issues before they impact users, and drive systemic reliability improvements.What You'll DoDesign and implement scalable observability infrastructure for metrics, logging, and tracing.Build high-performance telemetry pipelines that handle massive ingestion volumes.Develop APIs, query engines, and UIs that allow engineers to get real-time insights into their services.Define and enforce best practices for instrumentation, alerting, and reliability across the company.Partner with infrastructure and product teams to deeply integrate observability into our internal platforms.Own the reliability, scalability, and performance of the observability stack end-to-end.Ideal CandidateProduction-level proficiency in Go, Rust, Scala, or a similar languagesDeep understanding of distributed systems and telemetry architecture.Experience building and operating infrastructure at scale.Familiarity with observability stacks such as Prometheus, Grafana, OpenTelemetry, VictoriaMetrics, or ClickHouse.Experience with Kafka, Redis, or large-scale time series databases.Experience operating observability pipelines in Kubernetes or similar orchestration environments.LocationsWe hire engineers in Palo Alto, and San Francisco. Our team usually works from the office 5 days a week but allow work-from-home days when required. Candidates who join in San Francisco must make it to Palo Alto at least twice a week.Interview ProcessAfter submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview ("phone interview") during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of 2 technical interviews and 1 project deep-dive interview:Practical coding assessment in a language of your choice.Systems design hands-on: Demonstrate practical skills in a live problem-solving session.Project deep-dive: Present and answer questions about exceptional work that you've done.Meet and greet with the wider team.Our goal is to finish the main process within one week. Final interviews will be conducted in person.Annual Salary Range$180,000 - $440,000 USDxAI is an equal opportunity employer.California Consumer Privacy Act (CCPA) Notice
Location:
Stanford, CA, United States
Category:
Computer And Mathematical Occupations

We found some similar jobs based on your search