Senior Software Engineer - Infrastructure, San Francisco, CA, United States

Senior Software Engineer - Infrastructure

3 Days Old

The Salesforce Industries Infrastructure Engineering team manages the public cloud infrastructure for our services. We design, provision, and maintain the infrastructure throughout its lifecycle. We are responsible for maintaining 99.99% service availability for one of the largest and most trusted cloud platforms in the world.

We are looking for a Senior Software Engineer (Infrastructure) who will take an active role in the team in implementing our vision to fully automate the infrastructure lifecycle management, including incident remediation and prevention for our SaaS services. If you are passionate about customers’ success, providing high-quality SaaS services, creative thinker, love to automate, and believe that everything can and should be automated, then this is your dream career opportunity.

We are seeking a skilled and detail-oriented individual to join our team. As a Senior Software Engineer, you will be responsible for onboarding new services onto the Salesforce Hyperforce public cloud environment and maintaining the existing ones; ensuring the availability and performance of all our services, systems, and applications. This role requires a deep understanding of Public Cloud system architecture; strong analytical skills to identify potential bottlenecks or areas of improvement.

You will work closely with the Development, Quality, Performance, and Support teams and support multiple sub-clouds within the Industries verticals. The candidate must be a self-starter and possess excellent analytical skills. Passion for security, availability, and prior experience working at or closely with CRM and Cloud Service Providers is a major plus. Responsibilities Design, provision, and maintain the infrastructure throughout its lifecycle Monitor the availability and performance of cloud services, systems, and applications. Work with engineers on the design, deployment and continuous improvement of meaningful infrastructure services (i.e logging, monitoring and alerting) Analyze system and application metrics to identify potential performance issues or bottlenecks. Design, implement, and maintain monitoring tools and systems to track and report on availability and performance. Collaborate with cross-functional teams, including architects, developers, and infrastructure teams, to identify and resolve issues. Provide guidance into long-range platform requirements and operational guidelines, with a focus on automation and continuous improvement of Platform Service Operability and availability Develop and maintain monitoring dashboards and reports to provide visibility into the availability and performance of architectural services. Participate in capacity planning exercises to ensure the scalability and reliability of our systems and services. Conduct root cause analysis for incidents and provide recommendations for improvements. Stay updated on industry trends and best practices related to availability monitoring and performance optimization. Continuously raise our standard of engineering excellence by implementing standard processes for coding, testing, and deployment. Document monitoring procedures, configurations, and troubleshooting guides. Requirements Solid understanding of configuration, deployment, management, and maintenance of large cloud-hosted systems; including auto-scaling, monitoring, performance tuning, troubleshooting, and disaster recovery Proficiency in designing and implementing sophisticated monitoring and alerting solutions for maintaining 99.99% and higher service availability Participate in the team's on-call rotation to address complex problems in real-time and keep services operational and highly available Expertise in cloud computing platforms such as AWS, Azure, or Google Cloud Platform is a must. Proven work experience as a monitoring specialist, or a similar role Excellent analytical and problem-solving skills with the ability to identify and resolve performance and service availability issues In-depth, hands-on experience with Linux, networking, server, and cloud architectures Solid understanding of network protocols, infrastructure components, and virtualization technologies (Kubernetes preferred) Bachelor's degree in computer science, information technology, or a related field. A master's degree is a plus. 7+ years of experience in Software Development with a focus on service availability and reliability 5+ years of experience with large-scale, high-volume SaaS, PaaS, or other cloud provider environments Fluency in one or more scripting languages such as Python or Ruby is a must. Strong communication and collaboration skills to work effectively with cross-functional teams. Excellent written and verbal communication, able to collaborate and rally support

#J-18808-Ljbffr

Apply

Location:: San Francisco, CA, United States
Category:: IT & Technology