Senior Site Reliability Engineer (SRE), New York City

Senior Site Reliability Engineer (SRE)

New Yesterday

A leading client is looking for a Senior Site Reliability Engineer (SRE) to lead efforts in ensuring the reliability, scalability, and performance of critical production systems. This is a hybrid role based in New York , requiring onsite presence from Day 1.

As a Senior SRE, you’ll partner with architecture, engineering, and security teams to drive operational excellence through observability, automation, and incident response best practices.

Key **

Design and develop enterprise-grade APIs and configuration management solutions

Drive enterprise and application architecture improvements

Lead initiatives around monitoring , alerting , dashboarding , and incident response

Build and maintain observability * * Grafana , Prometheus , Splunk

Develop and manage detailed runbooks for operational procedures

Define and monitor SLAs , SLOs , and KPIs for mission-critical services

Evaluate new tools and technologies to improve system performance and reliability

Collaborate cross-functionally with development, infrastructure, and security teams

Required Skills & **

Strong background in IT infrastructure , cloud platforms (AWS, Azure, or GCP), and modern SRE practices

Proven experience in building APIs and backend systems

Solid understanding of enterprise/application architecture

Hands-on experience *

Monitoring & * *** Grafana, Prometheus, Splunk

ITSM & Operations ***** ServiceNow, OpsRamp

Incident ***** JIRA

Experience **

Managing large-scale distributed systems

Building alerts, dashboards, and operational runbooks

Excellent leadership, communication, and problem-solving skills

Preferred *

Exposure to OpenShift and Azure

Certifications such * *** SRE Foundation, ITIL, relevant cloud certifications (AWS, Azure, GCP)

Apply

Location:: New York City

Start a New Search