Cloud Site Reliability Engineer

New Yesterday

We are seeking a highly skilled Site Reliability Engineer with 3 years of experience to join our dynamic team. The ideal candidate will have a strong background in cloud technologies, with a focus on designing, implementing, and managing cloud-based solutions. As a Site Reliability Engineer, you will play a key role in ensuring the availability, performance, and security of our cloud infrastructure.
In this role you will:
* Lead the day-to-day technical operations, providing the highest levels of availability, reliability, and scalability of the services.
* Implement best practices for cloud security, including identity and access management, encryption, and network security.
* Provide technical expertise to handle customer escalations and ensure stability in customer environments.
* Conduct performance analysis and lead monitoring initiatives on multiple hosted products/platforms.
* Maintain operational run book procedures for all production systems and document the knowledge base.
* Administer incident management activities (detection, recording, classification, and closure) and provide timely escalations and notifications as required by procedure.
* Participate in on-call rotation to respond to cloud-related incidents and emergencies.
* Troubleshoot and resolve complex technical issues in a timely manner.
* Monitor and optimize cloud infrastructure for performance, cost, and security.
* Collaborate with cross-functional teams to troubleshoot and resolve complex cloud-related issues.
* Mentor junior team members and provide technical guidance and support.
You've got what it takes if you have:
* U.S. citizenship required
* Minimum bachelor's degree in computer science, engineering, or a related field, or equivalent experience.
* 3+ years of experience in cloud operations.
* Comprehensive understanding of cloud computing principles and architectures.
* Extensive experience in Linux/Unix environments.
* Proficiency in containerization technologies like Docker and Kubernetes.
* Strong scripting skills in Python or Bash.
* Proficient in debugging and optimizing Java-based applications.
* Hands-on experience in deploying, optimizing, and troubleshooting applications on Tomcat and JBoss application servers.
* Hands-on experience in managing and optimizing Memcached, Nginx, ActiveMQ, Elasticsearch, and Redis applications.
* Experience with monitoring and logging tools such as Newrelic and the ELK stack.
* Sound knowledge of networking concepts, including TCP/IP, DNS, and VPN.
* Proficiency in automation and configuration management tools like Ansible, Jenkins, and Bitbucket.
* Thorough understanding of monitoring and alerting tools such as Nagios, New Relic, Grafana, and CheckMk.
* Experience with distributed storage technologies such as NFS, Netapp, and Amazon S3, as well as dynamic resource management frameworks (e.g., Kubernetes).
* Experience working in Datacenter and AWS cloud platforms.
* Strong communication and collaboration skills.
* Excellent troubleshooting and problem-solving skills.
Location:
Dublin, CA, United States
Category:
Null

We found some similar jobs based on your search