Site Reliability Engineer

New Today

HappyRobot is a platform to build and deploy AI workers that automate communication. See a demoOur AI workers connect to any system or data source to handle phone calls, email, messages…We target the logistics industry which relies heavily on communication to book, check on, & pay for freight. Primarily working with freight brokers, 3PLs, freight forwarders, shippers, warehouses, & other supply chain enterprises and tech startups.We raised a Series A round from a16z and YC and we’re growing very fast .We're looking for rockstars with a relentless drive, unstoppable energy, and a true passion for building something great—ready to embrace the challenge, push limits, and thrive in a fast-paced, high-intensity environment.About the RoleWe're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations.This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment.Must-Have1+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.)Strong problem-solving skills and ability to dive into unfamiliar backend codebasesComfort with Python and Go for reading code and writing small tools / utilitiesFamiliarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry)Clear, calm communication under pressure — especially during live incidentsNice-to-HaveExperience working with distributed systems or services at scaleBuilt or maintained internal tooling for on-call teams or reliability workflowsFamiliarity with deployment pipelines, CI / CD, or infra-as-codeExperience improving system observability (e.g., custom metrics, traces, log pipelines)Opportunity to work at a high-growth AI startup , backed by top investors.Fast Growth - Backed by a16z and YC , on track for double-digit ARR .Top-Tier Compensation - Competitive salary + equity in a high-growth startup.Ownership & Autonomy - Take full ownership of projects and ship fast.Work With the Best - Join a world-class team of engineers and builders.The personal data provided in your application and during the selection process will be processed by Happyrobot, Inc., acting as Data Controller.By sending us your CV, you consent to the processing of your personal data for the purpose of evaluating and selecting you as a candidate for the position. Your personal data will be treated confidentially and will only be used for the recruitment process of the selected job offer.In relation to the period of conservation of your personal data, these will be eliminated after three months of inactivity in compliance with the GDPR and legislation on the protection of personal data.If you wish to exercise your rights of access, rectification, deletion, portability or opposition in relation to your personal data, you can do so through security@happyrobot.ai subject to the GDPR.For more information, visit https : / / www.happyrobot.ai / privacy-policyBy submitting your request, you confirm that you have read and understood this clause and that you agree to the processing of your personal data as described.Create a job alert for this searchSite Reliability Engineer • San Francisco, California, United States #J-18808-Ljbffr
Location:
San Francisco, CA, United States