Engineering - DXR Engineering - Systems Engineer - Associate - Dallas
New Today
Procmon Platform delivers a highly scalable and reliable ecosystem for scheduling business critical jobs across Goldman Sachs.
Our platform is responsible for scheduling tens of millions of daily jobs for Global Banking & Markets, Asset & Wealth Management, Risk and other business and engineering functions.
The ecosystem includes a number of high availability, very large scale systems including
Job scheduling
Event streaming
Log shipping
Data warehouses
Security infrastructure
RESPONSIBILITIES
Own technical operations for systems that manage hundreds of thousands of compute cores
Build observability for new deployments to ensure robustness from day one, as well as mature deployments to identify and implement improvements
Troubleshoot and resolve issues with block devices, file descriptors, and packet loss
Lead real-time outage investigations and present postmortems to senior management
Define SLIs and SLOs and partner with development teams to ensure system are sufficiently well designed and instrumented
Partner with our development team throughout development and operations
Plan and manage deployments and migrations (including end-of-life programs)
Plan and implement robust business continuity and security programs
Provide regional coverage for the Procmon platform and participate in the on-call support
REQUIREMENTS
Excellent problem-solving and automation skills
Strong Linux fundamentals and system administration skills
Good networking fundamentals (familiarity with TCP/IP, IP routing, firewalls, secure tunneling protocols)
Experience working with distributed computing systems and Cloud computing environments
Proficiency in at least one programming language; the team uses a mix of Go, Python and Erlang
Able to operate effectively in a mission critical, highly regulated financial services environment
- Location:
- Dallas