Site Reliability Engineer

New Yesterday

Job Description

Job Description

Site Reliability Engineer (SRE)

Who are we, and what do we do?     

Edible Brands® is an innovative, Atlanta-based company that acquires, develops, and manages a world-class portfolio of consumer and service brands. From our flagship brand, Edible Arrangements®, to the diverse portfolio under our umbrella—including Rōti Modern Mediterranean®, edible.com®, edibles.com™, freshfruit.com™, and BerryDirect®—we’ve revolutionized the way people experience food.  

By combining exceptional products with cutting-edge e-commerce platforms, proprietary software, and a robust supply chain, we create memorable and accessible experiences for customers across the globe. As a dynamic, forward-thinking company, we are constantly evolving to deliver high-quality, innovative solutions that resonate with our customers and franchisees worldwide. 

Location: This is an onsite role based at our Corporate Office in Sandy Springs, GA, with a Monday–Friday schedule.

Purpose:

As a Site Reliability Engineer (SRE), you will be responsible for ensuring the resilience and reliability of our e-commerce applications through monitoring, automation, and proactive site maintenance. You will leverage Datadog, Azure Application Insights, and other industry-standard tools to develop robust monitoring systems that enhance site awareness, detect and respond to incidents, and maintain high availability. You will also drive collaboration across engineering teams to build a proactive approach to system health, site reliability, and incident management.

  • Monitoring and Site Reliability
  • Incident Response and Site Health
  • System Maintenance and Improvement
  • Collaboration and Documentation
  • Continuous Improvement and Learning

Responsibilities:

  • Develop, implement, and manage monitoring and alerting systems using Datadog, Azure Application Insights, and other related technologies to gain real-time awareness of system health and potential issues.
  • Ensure integration of Datadog with .NET, Node.js and React-based applications for comprehensive monitoring of application performance and health.
  • Establish proactive monitoring practices to reduce site outages, gain insight into system performance, and identify blockers within Azure DevOps pipelines.
  • Design and implement Standard Operating Procedures (SOPs) to effectively respond to and resolve incidents, minimizing downtime and ensuring prompt recovery.
  • Collaborate with engineering and product teams to establish and execute comprehensive incident response plans, focusing on improving the availability, performance, and reliability of e-commerce platforms.
  • Optimize Azure DevOps pipelines to ensure blockers, errors, and any build issues are proactively addressed, enhancing site deployment efficiency and reliability.
  • Maintain and improve application performance and resilience through enhancements in Azure Application Services, Azure Front Door, and Azure Application Gateway.
  • Execute SQL queries to assess and troubleshoot database performance and availability issues related to the operational health of the site.
  • Work closely with developers to ensure that monitoring tools are embedded effectively into the development cycle and are aligned with the business needs.
  • Create detailed documentation, including SOPs, best practices, incident management guides, and monitoring configurations.
  • Stay current with emerging monitoring technologies and identify opportunities to apply them to enhance the platform's reliability and scalability.
  • Promote a culture of learning and proactive improvement through root cause analysis and post-incident reviews to prevent repeat occurrences.

Requirements:

  • 5+ years of experience in Site Reliability Engineering, preferably within an e-commerce or high-traffic web application environment.
  • Strong expertise with Datadog, including setting up integrations, creating custom metrics, dashboards, and alerts, specifically in .NET, Node.js, and React applications.
  • Proven experience with Azure Application Insights, Azure DevOps, and the ability to implement monitoring and alerting solutions in cloud environments.
  • Hands-on experience managing and optimizing Azure App Services, Azure Front Door, Azure Application Gateway, and SQL databases from a resilience and performance standpoint.
  • Familiarity with SOP development for incident management, proactive monitoring, and site reliability.
  • Knowledge of CI/CD pipelines in Azure DevOps, and experience in identifying and resolving build blockers and pipeline issues.
  • Strong skills in writing SQL queries to diagnose and resolve issues.

Essential Competencies:

  • Excellent interpersonal skills, with an emphasis on collaboration, clear communication, and the ability to explain technical concepts to non-technical stakeholders.
  • Ability to work in a fast-paced environment, with strong analytical and problem-solving skills, and a proactive mindset towards automation and improvement.

What will set you apart:

  • Advanced certifications in Azure (e.g., Azure DevOps Engineer Expert, Azure Solutions Architect).
  • Extensive experience with high-traffic e-commerce applications and a track record of ensuring uptime and resilience.
  • Experience with other monitoring and observability tools (e.g., Grafana, Prometheus) is a plus.

What We Offer:

  • Onsite work environment, fostering collaboration and relationship building with peers, cross-functional partners and leadership.
  • The stability and resources of an industry-leading company successfully operating for 25 years, with the agility and innovation of a startup, allowing you to make a significant impact and shape our future.
  • Growth & Development – Each team member has a visible and immediate impact on the business, offering abundant opportunities for personal and professional growth as we scale in size and sophistication.
  • Healthcare plans that include health/dental/vision insurance, 401K Plan, company-paid life insurance and short-term disability, flexible spending account options and more.
  • Paid time off, including sick days & holidays to support work-life balance.

We are proud to be an EEO/AA employer. Applicants for employment are considered without regard to race, creed, color, religion, sex, sexual orientation, marital status, national origin, age, and disability, status as a veteran, Vietnam Era Veteran, or being a member of the Reserves or National Guard.

Location:
Atlanta
Category:
Technology

We found some similar jobs based on your search