A globally leading technology company is looking for an On-Device AI Runtime Engineer

New Yesterday

A globally leading technology company is looking for an On-Device AI Runtime Engineer to help build high-performance machine learning inference systems and optimized AI drivers for edge devices. In this role, you will contribute to developing model lifecycle management solutions and delivering efficient on-device runtime drivers for AI inference. If you're passionate about edge AI, system-level optimization, and deploying ML models on resource-constrained devices, we invite you to apply. Job Responsibilities: Design and implement robust Core ML model optimization pipelines for deploying large-scale ML models on resource-constrained devices. Support product engineering teams by consulting on AI model performance, iterating on inference solutions to solve real-world mobile/edge AI problems, and developing/delivering custom on-device AI frameworks. Interface with hardware and platform teams to ensure optimal utilization of neural processing units (NPUs), GPUs, and specialized AI accelerators across the device ecosystem. Qualifications: Strong proficiency in Swift/Objective-C and Metal Performance Shaders. Familiar with various ML model formats such as Core ML, ONNX, TensorFlow Lite, and PyTorch Mobile. Strong critical thinking, performance optimization, and low-level system design skills. Experience with model quantization, pruning, and hardware-aware neural architecture optimization. Experience with real-time inference pipelines and latency-critical AI applications. Understanding of mobile device thermal management, power consumption patterns, and compute resource allocation for AI workloads. Type: Contract Duration: 3 months (with possibility to extend to 18 months) Work Location: Sunnyvale, CA (100% On site) Pay range: $ 77.00 - $ 92.00 (DOE)
Location:
Santa Clara, CA, United States