Sr. Site Reliability Engineer

New Yesterday

Job Title: Site Reliability Engineer
Job Description
We are seeking a highly skilled Site Reliability Engineer with extensive experience in managing large-scale microservice-based systems. You will play a pivotal role in ensuring high availability and implementing best practices in reliability engineering. In collaboration with development and operations teams, you will enhance our infrastructure and improve system performance while being mindful of cost-effectiveness. Responsibilities
Proactively identify performance improvements in areas such as responsiveness, availability, and scalability. Establish and promote best practices around observability, monitoring, and incident response. Lead incident response efforts and conduct post-mortem analyses to prevent future occurrences. Coordinate with Software Engineering and DevOps teams to design, implement, and maintain scalable and reliable systems using Kubernetes, Docker, and Istio. Monitor system performance and troubleshoot issues proactively, utilizing Datadog for observability. Implement and tune Horizontal Pod Autoscalers (HPAs) to optimize resource utilization. Develop and maintain automation tools for deployment, monitoring, and incident response. Collaborate with software engineering teams to improve system reliability and performance. Implement A/B deployments, canary deployments, and traffic mirroring strategies for smooth critical updates. Mentor junior engineers and contribute to team knowledge sharing. Oversee and coordinate with SREs globally, ensuring effective collaboration during on-call rotations. Establish and enforce best practices for system reliability and performance across the organization. Utilize Helm charts for application deployment and manage AWS systems, including AWS Load Balancers and routing, to support high-volume systems. Participate in on-call rotations and provide support for production systems. Essential Skills
5+ years of production experience as a Site Reliability Engineer, DevOps Engineer, or Software Engineer. Demonstrated ability to deliver highly available solutions at scale. Advanced problem-solving, troubleshooting, and decision-making skills. Expertise in containerization technologies such as Docker, Kubernetes, and Istio. Proficiency in AWS. Experience with Argo CD for continuous delivery and GitOps practices. Proficiency in monitoring and alerting tools, particularly Datadog, AppDynamics, ELK, or Prometheus. Familiarity with A/B, Canary, Blue/Green deployments, and traffic mirroring techniques. Experience with scripting and orchestration tools such as Terraform, Ansible, or equivalent. Ability to balance cost considerations with performance and reliability. Experience in delegating tasks and leading initiatives. Ability to apply systems thinking to understand interdependencies and design solutions. Excellent verbal and written communication skills. Additional Skills & Qualifications
Proficiency in Golang or Rust is a plus but not required. Experience mentoring and providing technical guidance to junior team members. Ability to work independently and take ownership of tasks. Organized and detail-oriented. Ability to develop healthy working relationships and collaborate with peers and leaders. Exhibits integrity and high standards in work quality. Values diversity and differences amongst individuals in interactions. Work Environment
This position is based in Plano, Texas. Employees are encouraged to live within a reasonable commuting distance of their assigned work location for hybrid work. We celebrate and are committed to a diverse and inclusive workplace, providing reasonable accommodations as necessary. Pay and Benefits
The pay range for this position is $50.00 - $80.00/hr. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following: • Medical, dental & vision • Critical Illness, Accident, and Hospital • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available • Life Insurance (Voluntary Life & AD&D for the employee and dependents) • Short and long-term disability • Health Spending Account (HSA) • Transportation benefits • Employee Assistance Program • Time Off/Leave (PTO, Vacation or Sick Leave) Workplace Type
This is a hybrid position in Plano,TX. Application Deadline
This position is anticipated to close on Jul 31, 2025.
Location:
Plano