Senior Site Reliability Engineer, Deployments

New Today

MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. We enable organizations of all sizes to easily build, scale, and run modern applications by helping them modernize legacy workloads, embrace innovation, and unleash AI. Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed, multi-cloud database and is available in more than 115 regions across AWS, Google Cloud, and Microsoft Azure. Atlas allows customers to build and run applications anywhere—on premises, or across cloud providers. With offices worldwide and over 175,000 new developers signing up to use MongoDB every month, it’s no wonder that leading organizations, like Samsung and Toyota, trust MongoDB to build next-generation, AI-powered applications. Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational functions that support the broader engineering organization. Among these are our multi-cloud-provider Kubernetes infrastructure, networking, load balancing (including our public-facing edge and internal service mesh), and observability and alerting systems.
The Deployments team designs and maintains our continuous delivery infrastructure, ensuring reliable code deployment from development through production for all engineering teams. This infrastructure is primarily composed of Argo Workflows and ArgoCD. The team also provides tooling that enables clear system ownership and facilitates self-service onboarding for development teams.
This role will be based remotely in the United States.
The ideal candidate should Have 6+ years of experience in software development and operating distributed systems
Proficiency in Python, Go, or a similar language
Proven experience building and operating large-scale continuous integration and continuous deployment (CI/CD) pipelines
Possess a customer-focused mindset
Value efficiency in processes and operations
Prefer automation over manual process (“allergic to ops work”). We are a small team of software engineers with a strong bias towards software solutions to avoid toil
Experience using and extending containerization technologies, particularly Kubernetes, to enhance application agility, optimize resource utilization, and accelerate time-to-market
Expertise in cloud infrastructure platforms, including AWS, Google Cloud Platform (GCP), or Azure
Understanding of Linux operating system internals and networking concepts (e.g., TCP/IP, DNS, TLS, routing Expectations Contribute to developing a world-class continuous deployment experience, enabling the rapid and reliable shipment of MongoDB products
This includes, but is not limited to, contributing to open-source projects, or engineering software-based approaches like Kubernetes operators to streamline processes
Own the onboarding flow other engineering teams follow when launching a new product or service
Collaborate with other teams within Platform Engineering to ensure a consistent service-onboarding experience
Provide internal support for our deployment systems, including answering questions and addressing issues
Participate in a 24/7 on-call rotation to resolve issues involving the deployment infrastructure
Location:
Us