Senior Software Reliability Engineer-San Jose Bay Area (Hybrid)

New Yesterday

Job Description

Job Description

Company Overview:

At Nile, we envision an enterprise network that inherently defends against cyber threats, eliminates lateral attack vectors like ransomware, and operates free of complexity. Our goal is to deliver Campus Network-as-a-Service (NaaS) that makes network operations virtually invisible to our customers by pushing the boundaries of autonomy. Imagine a network that continuously monitors, optimizes, and upgrades itself—all without the need for human intervention. Our audacious journey began in 2018 when we brought together a team of industry veterans and visionaries in networking, cybersecurity, cloud software, and AI to disrupt a $100 billion enterprise networking market, starting with the wired and wireless LAN. Today, our Nile Access Service is redefining connectivity as a service for organizations worldwide, from cutting-edge technology companies to leading healthcare and financial institutions, and beyond.

Where do we go from here? Well, that’s where you come in. We are expanding in all areas, bringing in some of the brightest talent to further shape Nile’s future, prepare for growth, and tackle tough tasks to ensure our momentum never slows.

About the Role
You’ll be instrumental in shaping test architecture, automation strategy, and deployment readiness for large-scale distributed systems—working in close alignment with engineering, product, and field teams to ensure seamless customer experiences.

Key Responsibilities

  • Architect and maintain automated test suites using Python, Pytest, or similar frameworks, tailored to Kubernetes-based systems.
  • Implement robust CI/CD workflows via Jenkins, integrating observability checkpoints to maintain deployment quality.
  • Validate Kafka-driven event flows, telemetry pipelines, and ensure observability using tools like Elastic Stack, OpenTelemetry, and Druid.
  • “Wear the customer hat”—simulate real-world usage, perform exploratory testing, and anticipate production edge cases.
  • Collaborate with field engineers and the broader customer community to drive readiness, capture deployment insights, and influence product improvements.
  • Contribute to high-scale testing disciplines including chaos testing, network fault injection, and performance validation.

Required Skills

  • Strong command of Python (or equivalent languages), with experience in frameworks like Pytest.
  • Hands-on expertise with Kubernetes, Kafka, and Elastic Stack in cloud-native environments.
  • Deep understanding of CI/CD pipelines, Jenkins, and automation strategies with quality at every stage.
  • Proficiency with observability stacks and proactive debugging approaches.
  • Familiarity with networking protocols and networking deployments and distributed system diagnostics.
  • Familiarity with the new NaaS paradigm: how networks are provisioned, managed, monitored, and troubleshooted entirely from the cloud.
Location:
San Jose
Category:
Technology

We found some similar jobs based on your search