AWS Cloud Ops Engineer
New Today
Position Description:
CGI has an immediate need for a AWS Cloud Ops Engineer to join our team. This is an exciting opportunity to work in a fast-paced team environment supporting one of the largest customers. We take an innovative approach to supporting our client, working side-by-side in an agile environment using emerging technologies.
• This role is located at a client site in Reston, VA. A hybrid working model is acceptable.
We partner with 15 of the top 20 banks globally, and our top 10 banking clients have worked with us for an average of 26 years!
We have over 92,+ CGI Members in 40 countries and over 5k+ loyal Clients who are leveraging our end-to-end services across the globe.
Your future duties and responsibilities:
Under minimal supervision, provide operations support for office or business unit users of proprietary or custom application software in a 24/7/ environment supporting Cloud Operations. Position will require some work during non-traditional business hours to support large scale cloud platforms that support mission critical applications, take point on end to end support and smooth operations of cloud based infrastructure, support change windows, incident response and resolution and other scheduled maintenance activities. Will be trained and required to follow Incident, Change and Problem standards. Individual will gain business and application knowledge through training and resolving Production incidents and inquiries.
Key Job Functions
1. Incident Management
• Triage and resolve Production incidents related to the cloud platform and participate in root cause analysis and postmortem discussions.
• Provide on the job training and support to new and junior team members as required.
• Analyze cloud platform related Production incidents and engage business teams(s) to determine impact of incident.
• Work with application support members and cloud support vendors to identify a work-around if permanent solution cannot be reached in a timely manner. Provide a collaborative conduit between application/support teams and the Cloud vendor support such as AWS, Azure etc.
• Escalate to team leads in a timely manner when resolution cannot be achieved.
• Help recreate and test possible solutions and/or workarounds in lower environments prior to implementing in Production.
• Work closely with Cloud Engineering team and other support staff to identify and resolve incidents and create and implement long term remediation techniques and fixes.
• Identify and document known issues and work with Cloud engineering partners and vendor support to address reoccurrence and the identified workaround activity
2. Operations, Monitoring, and Capacity Planning
• Cloud operations and infrastructure management - rehydration activities, IAM, security and compliance, availability, data protection, authentication and authorization, capacity and resource management, service metering and operational cost oversight, disaster recovery and mitigation.
• Create processes designed to measure system effectiveness and identify areas for improvement.
• Create processes intended to provide environment security, as well as automated processes to provide information on current specifications.
• Stay abreast of new technologies in the field and provide recommendations to organizational management on new solutions.
• Oversee the selection of orchestration tooling, as well as compliance audits and reporting.
• Identify, correct, and enhance important software tools; seek ways to enhance systems operations, with a focus on automation and minimizing cost.
• Build effective monitoring, alerts, and metrics for production services.
• Plan for adequate capacity of systems based on utilization metrics and planned projects to establish supply and demand forecasts.
3. Change Management
• Work closely with internal team members and other stakeholders to review proposed changes and help devise post implementation verification routines and system health checks.
• Assist in testing changes in lower environments to ensure solution is as desired.
• Create and review operational change tickets with senior team members when changes to Production are needed ensuring they are complete, clear and concise.
• Review operational change tickets with senior team members after they are submitted by other teams to make sure they are complete, clear and concise and meet all requirements of the change standard.
• Communicate impacts of change to all stakeholders in a timely manner
• Coordinate with patch management teams as well as teams involved in infrastructure upgrades.
• Coordinate emergency changes per standards
4. Compliance and Security
• Provide assistance in maintaining compliance with password resets, access reviews, remediation of Operational Incidents and MSIs.
• Assist in documenting remediation steps for operational incidents and/or an MSI.
• Engage with management, risk and compliance teams as needed.
Required qualifications to be successful in this role:
• 6-8 years of related experience on Production Support
• 2-3 years of related hands-on experience on AWS
• Broad knowledge of the AWS platform, AWS Certification required
• Experience with Azure a strong plus
• Solid knowledge of AWS platform and its services - including but not limited to: AMIs, Route53, VPC, EC2, S3, IAM, AWS CLI, EBS, ELB, SQS, Cloud Watch, Cloudtrail.
• Experience with Docker/Kubernetes and container orchestration.
• Hands on experience in AWS provisioning of systems, securing of VPC, implementation of Security Groups, Identity and Access Management, Backups, Restore and Disaster Recovery.
• System health monitoring and optimizing performance (CloudWatch, SolarWinds, Nagios, SumoLogic, Splunk).
• Administration of web servers running Apache, Tomcat, IIS, Nginx.
• Networking including DNS, certificate management, load balancing, firewalls and routing.
• Broad experience with software-defined and traditional networking.
• Strong understanding of Linux, including experience with server administration, monitoring, and troubleshooting.
• Broad experience with IaaS and PaaS.
• Broad experience building cloud infrastructure using infrastructure-as-code tools like AWS Cloud Formation or Terraform.
• Exceptional problem solving
• Excellent communications and collaboration skills required to develop required security policies and share information with business and technology staff.
• Project management and implementation skills to implement new technologies as necessary.
• Must have previous operations experience in cloud environments
• Strong written and oral communication skills.
• Ability to lead technical discussions between stakeholders.
• ServiceNow or other ticketing system
• Window O/S
• Autosys job management for scheduling, monitoring and reporting
• Middleware technologies such as Weblogic, JBOSS, Apache, Global Load Balancers
• Tibco/ESB
• Experience in support various phases of SDLC (Waterfall or Agile)
• Demonstrable knowledge of ITIL and or Service Management
Education:
Bachelor's degree or equivalent preferred
Area of Study: Computer Science or IS/IT preferred
Other Information:
CGI is required by law in some jurisdictions to include a reasonable estimate of the compensation range for this role. The determination of this range includes various factors not limited to skill set, level, experience, relevant training, and licensure and certifications. To support the ability to reward for merit-based performance, CGI typically does not hire individuals at or near the top of the range for their role. Compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range for this role in the U.S. is $88,.00 - $,.00.
CGI’s benefits are offered to eligible professionals on their first day of employment to include:
• Competitive compensation
• Comprehensive insurance options
• Matching contributions through the (k) plan and the share purchase plan
• Paid time off for vacation, holidays, and sick time
• Paid parental leave
•Learning opportunities and tuition assistance
• Wellness and Well-being programs
#LI-JN2
Skills:
Analytical Thinking
Kubernetes
Linux
Splunk
Terraform
TIBCO
Waterfall Model
- Location:
- Reston