Senior Site Reliability Engineer- Central Platforms
New Today
We are seeking a Site Reliability Engineer (SRE) to join our Internal Platform Services team , responsible for the reliability, scalability, and performance of the core services that power our internal engineering ecosystem. You will work at the intersection of development and operations, enabling product teams to move quickly and safely by building and maintaining robust, self-service infrastructure components like Kubernetes clusters, internal databases, CI/CD pipelines, observability tools, and cloud APIs .
Key Responsibilities Ensure reliability, scalability, and performance of services through SLIs/SLOs, capacity planning, and incident response.
Drive automation of infrastructure operations to minimize toil.
Develop and support monitoring, alerting , and observability systems to support proactive issue detection.
Partner with internal engineering teams to define service-level objectives , improve deployment workflows, and integrate infrastructure with development needs.
Contribute to on-call rotations and incident management , helping ensure high availability of services.
Drive post-incident reviews and blameless retrospectives to improve reliability.
Stay current with emerging technologies and recommend improvements to existing systems and practices.
Qualifications Required:
3+ years of experience as an SRE, DevOps Engineer, or Infrastructure Engineer.
Solid experience with Kubernetes administration and tooling (e.g., Helm, ArgoCD, Kustomize).
Strong expertise in cloud platforms (e.g., AWS, GCP, or Azure).
Experience managing databases in production environments (e.g., backups, replication, tuning).
Proficiency in programming or scripting (e.g., Go, Python, Bash).
Deep understanding of CI/CD pipelines and infrastructure automation .
Familiarity with monitoring/observability tools (e.g., Prometheus, Grafana).
Strong communication skills and ability to collaborate with software engineering teams.
Preferred:
Experience in multi-tenant infrastructure environments.
Exposure to compliance and security best practices in infrastructure environments.
Why Join Us Be a key driver of internal engineering productivity and reliability .
Work with modern, cloud-native technologies in a high-impact environment.
Join a collaborative, learning-focused team where your ideas shape the platform.
Competitive compensation, flexible work arrangements, and ongoing professional growth.
- Location:
- Washington