Cloud Site Reliability Engineer, Dublin, CA, United States

Cloud Site Reliability Engineer

New Yesterday

We are seeking a highly skilled Site Reliability Engineer with 3 years of experience to join our dynamic team. The ideal candidate will have a strong background in cloud technologies, with a focus on designing, implementing, and managing cloud-based solutions. As a Site Reliability Engineer, you will play a key role in ensuring the availability, performance, and security of our cloud infrastructure.

In this role you will:

* Lead the day-to-day technical operations, providing the highest levels of availability, reliability, and scalability of the services.

* Implement best practices for cloud security, including identity and access management, encryption, and network security.

* Provide technical expertise to handle customer escalations and ensure stability in customer environments.

* Conduct performance analysis and lead monitoring initiatives on multiple hosted products/platforms.

* Maintain operational run book procedures for all production systems and document the knowledge base.

* Administer incident management activities (detection, recording, classification, and closure) and provide timely escalations and notifications as required by procedure.

* Participate in on-call rotation to respond to cloud-related incidents and emergencies.

* Troubleshoot and resolve complex technical issues in a timely manner.

* Monitor and optimize cloud infrastructure for performance, cost, and security.

* Collaborate with cross-functional teams to troubleshoot and resolve complex cloud-related issues.

* Mentor junior team members and provide technical guidance and support.

You've got what it takes if you have:

* U.S. citizenship required

* Minimum bachelor's degree in computer science, engineering, or a related field, or equivalent experience.

* 3+ years of experience in cloud operations.

* Comprehensive understanding of cloud computing principles and architectures.

* Extensive experience in Linux/Unix environments.

* Proficiency in containerization technologies like Docker and Kubernetes.

* Strong scripting skills in Python or Bash.

* Proficient in debugging and optimizing Java-based applications.

* Hands-on experience in deploying, optimizing, and troubleshooting applications on Tomcat and JBoss application servers.

* Hands-on experience in managing and optimizing Memcached, Nginx, ActiveMQ, Elasticsearch, and Redis applications.

* Experience with monitoring and logging tools such as Newrelic and the ELK stack.

* Sound knowledge of networking concepts, including TCP/IP, DNS, and VPN.

* Proficiency in automation and configuration management tools like Ansible, Jenkins, and Bitbucket.

* Thorough understanding of monitoring and alerting tools such as Nagios, New Relic, Grafana, and CheckMk.

* Experience with distributed storage technologies such as NFS, Netapp, and Amazon S3, as well as dynamic resource management frameworks (e.g., Kubernetes).

* Experience working in Datacenter and AWS cloud platforms.

* Strong communication and collaboration skills.

* Excellent troubleshooting and problem-solving skills.

Apply

Location:: Dublin, CA, United States
Category:: Null

Start a New Search

Cloud Site Reliability Engineer

We found some similar jobs based on your search

Principal Site Reliability Engineer (Cortex Cloud Security Posture Management)

Senior Site Reliability Engineer (Cortex Cloud Security Posture Management)

Principal Site Reliability Engineer Cloud Identity & Trust

Cloud Site Reliability Engineer (SRE)

Staff Site Reliability Engineer - Cloud Engineering

Cloud Site Reliability Engineer (SRE)