Machine Learning Engineer (Data)

New Today

Join to apply for the Machine Learning Engineer (Data) role at InceptionJoin to apply for the Machine Learning Engineer (Data) role at InceptionGet AI-powered advice on this job and more exclusive features.About usInception is a generative AI startup. Leveraging breakthrough AI research, we are training next-generation large language models (LLM) powered by diffusion. Unlike existing auto-regressive models, which only output one token at a time, diffusion LLMs can output many tokens in parallel. This means that they are several times faster and can leverage their additional test-time compute to improve quality. They also enable fine-grained control over their outputs to adhere to specific schema and semantic constraints, and they provide a unified paradigm for combining language with other data modalities, including audio, images, and videos.Our team is led by Stefano Ermon (co-inventor of diffusion models, flash attention, and DPO; faculty at Stanford), Aditya Grover (co-inventor of node2vec and decision transformers; faculty at UCLA), and Volodymyr Kuleshov (prev. co-founder and CTO at Afresh Technologies; faculty at Cornell), and includes engineers from Google Deepmind, Meta AI, Microsoft AI, and OpenAI. We are currently deploying large-scale diffusion LLMs at Fortune 500 companies.Role OverviewWe seek experienced Machine Learning Engineers to shape how we collect, process, and curate the datasets that power our models. This interdisciplinary role combines engineering expertise with research insights to build scalable data pipelines, develop synthetic data generation techniques, and ensure our models are trained on high-quality, diverse datasets.Key ResponsibilitiesDesign and implement scalable data pipelines for processing petabyte-scale datasetsBuild systems for web crawling, data ingestion, and real-time data processing to support model training operationsDevelop tools and frameworks for efficient data storage, retrieval, and versioning across distributed systemsDevelop techniques for collecting, augmenting, filtering, and synthesizing training data using LLMs and other ML methodsCreate evaluation frameworks to measure data diversity, quality, and representativenessBuild systems for human-in-the-loop data validation and annotation workflowsEnsure data collection adheres to privacy regulationsCollaborate with ML researchers to identify data requirements and optimize training recipesQualificationsBS/MS/PhD in Computer Science, Machine Learning, or related field (or equivalent experience)3+ years of experience building data processing pipelines at scale, particularly with AI/ML applicationsStrong proficiency in Python and experience with data processing frameworks (Apache Spark, Beam, Airflow)Experience with distributed computing and large-scale data storage systems (HDFS, S3, BigQuery)Solid understanding of machine learning fundamentals and experience with ML frameworks (PyTorch, TensorFlow)Experience with SQL and NoSQL databases for managing structured and unstructured dataFamiliarity with version control (Git) and infrastructure as code practicesStrong analytical skills with attention to detail in data quality assessmentExcellent communication skills to work effectively with researchers and engineersPreferred SkillsExperience with large language models and understanding of tokenization, embeddings, and model architecturesFamiliarity with web scraping, crawling technologies, and Common Crawl datasetsExperience managing human annotation workflows and quality control processesExperience with vector databases and embedding-based retrieval systemsFamiliarity with synthetic data generation techniques and data augmentation strategiesKnowledge of data privacy regulations and ethical AI practicesWhy Join UsImpact: Deploy LLMs that transform how millions of users work, create, and solve real-world problems.Innovation: Pioneer novel data recipes for diffusion LLMs.Growth: Enjoy a fast-paced, collaborative environment where your contributions will directly shape the future of generative AI.Perks & BenefitsCompetitive salary and equity in a rapidly growing startup.Flexible vacation and paid time off (PTO).Health, dental, and vision insurance.Professional development opportunities (conferences, courses, etc.).This is an exciting opportunity to join a startup at the forefront of AI development! If you’re ready to make a tangible impact in the world of generative AI, apply today.We are an equal opportunity employer and encourage candidates of all backgrounds to apply.PI275689529Seniority levelSeniority levelMid-Senior levelEmployment typeEmployment typeFull-timeJob functionJob functionEngineering and Information TechnologyIndustriesResearch ServicesReferrals increase your chances of interviewing at Inception by 2xSign in to set job alerts for “Machine Learning Engineer” roles.San Francisco, CA $130,000.00-$230,000.00 6 months agoSan Francisco, CA $140,670.00-$195,400.00 2 days agoMachine Learning Engineer (I, II, or Sr.)Redwood City, CA $123,000.00-$185,000.00 1 week agoMachine Learning Engineer (I, II, or Sr.)Sunnyvale, CA $158,200.00-$185,000.00 4 days agoSunnyvale, CA $167,000.00-$185,500.00 4 days agoResearch Engineer - Machine Learning (ML)Machine Learning Engineer, Search Ads, Shopping Relevance ModelsMountain View, CA $141,000.00-$202,000.00 2 weeks agoSoftware Engineer, AI Platform - New GradMountain View, CA $145,000.00-$170,000.00 1 week agoMachine Learning Scientist, NLP (All Levels)San Francisco, CA $200,000.00-$300,000.00 4 months agoSan Francisco, CA $140,000.00-$215,000.00 1 month agoSan Francisco, CA $140,000.00-$180,000.00 5 months agoSunnyvale, CA $167,000.00-$185,500.00 2 weeks agoSan Francisco, CA $140,000.00-$160,000.00 5 months agoMachine Learning Scientist, NLP (All Levels)San Francisco, CA $200,000.00-$300,000.00 4 months agoSan Francisco, CA $115,000.00-$185,000.00 1 day agoMachine Learning Researcher - New College Grad 2025Mountain View, CA $120,000.00-$200,000.00 1 week agoSan Mateo, CA $140,000.00-$210,000.00 1 month agoSan Francisco, CA $140,000.00-$160,000.00 4 months agoMenlo Park, CA $180,000.00-$200,000.00 1 month agoSan Francisco, CA $150,000.00-$260,000.00 4 months agoPerception Software or Machine Learning EngineerSan Francisco, CA $100,000.00-$180,000.00 1 year agoWe’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI. #J-18808-Ljbffr
Location:
San Mateo, CA, United States