SOW - Cloud Engineer / Platform Support Specialist - Plano, TX.

New Today

Job Title: Cloud Engineer / Platform Support Specialist Location: Plano, TX
Overview: We are seeking a highly skilled Cloud Engineer / Platform Support Specialist to join our team. This role involves providing advanced-level support for a cloud platform within a large enterprise environment, hosting thousands of applications on AWS. The successful candidate will act as the first point of contact for application developers encountering technical issues, leveraging a ticketing system to manage incidents. This position requires a strong foundation in coding, software, infrastructure, and cloud technologies, and operates within a follow-the-sun support model. It requires strong communications including the ability the clearly articulate problem statements and solutions.
Key Responsibilities: Deliver incident management and advanced-level support for the AWS Platform, hosting a large volume of applications Serve as the initial point of contact for application developers via a ticketing system. Communicate effectively with users at various organizational levels. Implement and utilize automation to support the scalability of the environment. Optimize operational processes to enhance efficiency, reliability, and security. Train users to self-diagnose and troubleshoot issues for expedited resolution. Conduct thorough investigations into issues to identify root causes and document strategies to prevent recurrence. Provide support for public cloud environments, particularly AWS. Manage events and incidents efficiently. Develop and implement scalable automation processes to handle tasks in a large-scale environment. Analyze and debug incidents, follow up to gather feedback and prevent future issues. Support different development environments, including Unix, Linux, Mainframe, and Windows.
Required Skills and Experience: Extensive cloud experience, particularly with AWS (S3, ECS). Amazon Elastic Kubernetes Service (EKS) : Experience deploying, managing, and troubleshooting Kubernetes clusters on AWS EKS. Kubernetes Administration : Strong understanding of Kubernetes architecture, including pods, deployments, services, and networking. Helm & Kubernetes Operators : Familiarity with Helm charts for package management and Kubernetes Operators for automation. Cluster Security & RBAC : Knowledge of Kubernetes Role-Based Access Control (RBAC), security policies, and best practices. Scaling & Performance Optimization : Experience with autoscaling, load balancing, and optimizing Kubernetes workloads. Monitoring & Logging : Hands-on experience with tools like Prometheus, Grafana, Fluentd, or AWS CloudWatch for monitoring Kubernetes clusters. Containerization & Orchestration : Strong experience in Docker and other AWS containerized services (ECS and AWS Fargate) Terraform : Strong experience in writing, managing, troubleshooting and optimizing Terraform configurations for AWS infrastructure. Infrastructure as Code (IaC) Expertise : Deep understanding of IaC principles, including automation, version control, and modularization. AWS Cloud Services : Hands-on experience with AWS services such as EC2, S3, Lambda, VPC, IAM, and CloudFormation. Security Best Practices : Knowledge of AWS security policies, identity and access management (IAM), and compliance standards. CI/CD Integration : Experience integrating Terraform with CI/CD pipelines for automated deployments. Commitment to automating processes for continuous improvement. Proficiency in SDLC: with the ability to read code (Java and Python). Troubleshooting & Optimization : Ability to diagnose and resolve infrastructure issues, optimize performance, and ensure scalability. Strong troubleshooting and diagnostic skills for security and access issues in a large enterprise environment. Excellent communication skills: Ability to analyze details, understand incident causation, and implement preventive measures to ensure reliability and security.
Nice to have: Database management skills (Oracle DBA, Cassandra DBA, CockroachDB) include performance tuning, connectivity, backups, indexes, and monitoring alarms. Middleware and messaging experience (Kafka, MQ). Experience with Tomcat. System engineering and administration skills (Unix/Linux). Java or Python Development
Location:
Plano