A globally leading technology company is looking for an On-Device AI Runtime Engineer to help build high-performance machine learning inference systems and optimized AI drivers for edge devices. In this role, you will contribute to developing model lifecycle management solutions and delivering efficient on-device runtime drivers for AI inference. If you're passionate about edge AI, system-level optimization, and deploying ML models on resource-constrained devices, we invite you to apply.
Job Responsibilities:
Design and implement robust Core ML model optimization pipelines for deploying large-scale ML models on resource-constrained devices.
Support product engineering teams by consulting on AI model performance, iterating on inference solutions to solve real-world mobile/edge AI problems, and developing/delivering custom on-device AI frameworks.
Interface with hardware and platform teams to ensure optimal utilization of neural processing units (NPUs), GPUs, and specialized AI accelerators across the device ecosystem.
Qualifications:
Strong proficiency in Swift/Objective-C and Metal Performance Shaders.
Familiar with various ML model formats such as Core ML, ONNX, TensorFlow Lite, and PyTorch Mobile.
Strong critical thinking, performance optimization, and low-level system design skills.
Experience with model quantization, pruning, and hardware-aware neural architecture optimization.
Experience with real-time inference pipelines and latency-critical AI applications.
Understanding of mobile device thermal management, power consumption patterns, and compute resource allocation for AI workloads.
Type: Contract
Duration: 3 months (with possibility to extend to 18 months)
Work Location: Sunnyvale, CA (100% On site)
Pay range: $ 77.00 - $ 92.00 (DOE)
- Location:
- Santa Clara, CA, United States