Senior Site Reliability Engineer (SRE)

New Yesterday

A leading client is looking for a Senior Site Reliability Engineer (SRE) to lead efforts in ensuring the reliability, scalability, and performance of critical production systems. This is a hybrid role based in New York , requiring onsite presence from Day 1.
As a Senior SRE, you’ll partner with architecture, engineering, and security teams to drive operational excellence through observability, automation, and incident response best practices.
Key **
Design and develop enterprise-grade APIs and configuration management solutions
Drive enterprise and application architecture improvements
Lead initiatives around monitoring , alerting , dashboarding , and incident response
Build and maintain observability * * Grafana , Prometheus , Splunk
Develop and manage detailed runbooks for operational procedures
Define and monitor SLAs , SLOs , and KPIs for mission-critical services
Evaluate new tools and technologies to improve system performance and reliability
Collaborate cross-functionally with development, infrastructure, and security teams
Required Skills & **
Strong background in IT infrastructure , cloud platforms (AWS, Azure, or GCP), and modern SRE practices
Proven experience in building APIs and backend systems
Solid understanding of enterprise/application architecture
Hands-on experience *
Monitoring & * *** Grafana, Prometheus, Splunk
ITSM & Operations ***** ServiceNow, OpsRamp
Incident ***** JIRA
Experience **
Managing large-scale distributed systems
Building alerts, dashboards, and operational runbooks
Excellent leadership, communication, and problem-solving skills
Preferred *
Exposure to OpenShift and Azure
Certifications such * *** SRE Foundation, ITIL, relevant cloud certifications (AWS, Azure, GCP)
Location:
New York City

We found some similar jobs based on your search