Runtime Analytics Engineer
New Yesterday
Job Description
As a Remote Runtime Analytics Engineer, you will play a critical role in monitoring, analyzing, and optimizing real-time system behavior and application performance at scale. Your primary responsibility will be to design and implement observability and analytics solutions that provide actionable insights into runtime operations, including latency, throughput, resource consumption, and system anomalies.
This role is ideal for professionals with a deep understanding of telemetry data (logs, metrics, traces), performance profiling, and distributed system architecture. You'll collaborate closely with SREs, backend engineers, product teams, and platform owners to ensure systems are running reliably, efficiently, and transparently.
Key Responsibilities:
Design and implement real-time analytics pipelines that monitor and report on runtime behavior of services and systems
Collect, process, and visualize logs, metrics, and traces using tools like Prometheus, Grafana, Datadog, New Relic, or OpenTelemetry
Develop custom instrumentation and profiling solutions in high-performance environments (e.g., Java, Go, Python, C++)
Build dashboards, alerts, and visualizations to track performance, latency, resource utilization, and availability
Conduct root cause analysis of performance bottlenecks, outages, and latency spikes using distributed tracing and runtime diagnostics
Work with engineering teams to ensure observability best practices are embedded throughout the software development lifecycle
Support incident response and postmortems with accurate runtime diagnostics and historical data analysis
Optimize the efficiency and reliability of data collection agents, sidecars, and pipeline integrations
Create technical documentation for analytics architecture, instrumentation standards, and team workflows
Drive continuous improvements in system transparency and feedback loops for development and operations teams
Required Qualifications:
Bachelors degree in Computer Science, Engineering, Data Science, or a related technical field
2+ years of experience in observability, performance engineering, or backend analytics roles
Deep understanding of telemetry data (metrics, logs, traces) and APM (Application Performance Monitoring) principles
Hands-on experience with tools such as Grafana, Prometheus, ELK/EFK, Splunk, Jaeger, Zipkin, or OpenTelemetry
Strong programming/scripting experience in Python, Go, Java, or Node.js
Experience with containerized and orchestrated environments (Docker, Kubernetes, ECS, etc.)
Strong problem-solving and analytical skills with a passion for systems-level thinking
Excellent communication and collaboration skills in distributed/remote environments.
- Location:
- Atlanta
- Category:
- Business