Machine Learning Engineer, GenAI Quality, San Francisco, CA, United States

Machine Learning Engineer, GenAI Quality

New Yesterday

About Scale:

Scale’s Generative AI ML team develops models and services to power high-quality data generation and evaluation for the most advanced large language models on earth. We also conduct applied research on model supervision and algorithmic approaches that support frontier models for Scale’s applied-ML teams and the broader AI community. Scale is uniquely positioned at the center of the AI ecosystem as a leading provider of training and evaluation data, end-to-end ML lifecycle solutions, and frontier evaluations for public and private institutions.

About The Role:

This role focuses on developing ML systems to automate data quality evaluation and generation using large language models. You’ll build scalable systems to assess quality across accuracy, instruction adherence, factuality, and reasoning — and design robust evaluation frameworks to ensure alignment with human standards. This is one of the highest impact areas in the company and directly accelerates the development of aligned, performant foundation models. You’ll be deeply involved in the full lifecycle: from model design and fine-tuning, to prototyping, deployment, and monitoring. You’ll partner closely with engineering, research, and product teams to deliver cutting-edge solutions for both customers and internal GenAI data engines — Scale’s fastest-growing business. If you’re excited about combining human-machine evaluation, scaling high-quality training data, and shaping the next generation of foundation models, we’d love to hear from you. You will: Design, fine-tune, and evaluate large language models for structured quality evaluation and data generation tasks Develop robust evaluation frameworks to assess performance across accuracy, instruction following, reasoning, and other critical dimensions Build and maintain scalable ML services to automatically assess and generate high-quality training and evaluation data Research and apply state-of-the-art techniques in LLM training, post-training alignment (e.g., instruction tuning, RLHF), and tool-augmented reasoning Collaborate with research scientists, engineers, and product teams to integrate your work into production services used by top AI developers

Ideally you’d have: 3+ years of experience designing, training, and deploying ML models in production environments Strong background in NLP, LLMs, and deep learning frameworks like PyTorch, TensorFlow, or JAX Experience building microservices and deploying ML pipelines in cloud environments (e.g., AWS or GCP) Practical knowledge of LLM fine-tuning and evaluation for tasks like factuality, instruction adherence, and chain-of-thought reasoning Strong programming skills (e.g., Python) and a solid foundation in algorithms and data structures Strong communication skills and experience working cross-functionally

Nice to haves: Experience with post-training LLM techniques (instruction tuning, RLHF, tool use, or agent-based reasoning) Familiarity with data evaluation pipelines, dataset curation, or scalable annotation workflows Background in multimodal ML or model evaluation across domains such as code or long-context generation

#J-18808-Ljbffr

Apply

Location:: San Francisco, CA, United States
Salary:: $250,000 +
Category:: Engineering