Machine Learning Infrastructure Engineer
New Today
Job Description
The Diffuse Project is dedicated to advancing our understanding of protein motion through the use of diffuse scattering – a signal in X-ray crystallography that is currently under-utilized or ignored, that will unlock our ability to measure protein dynamics. We are bringing together a diverse team of researchers, software developers, and beamline scientists to accomplish our mission. We are committed to Open Science principles of making all of our work, software, and data open and FAIR all along the way. The Diffuse Project is generously funded by and is part of the Astera Institute. You can read more about The Diffuse Project here, and Astera’s mission, vision, and programming here.
Position Summary:The Diffuse Project is seeking a Machine Learning Infrastructure Engineer to lead the development of robust, scalable backend systems that power machine learning–driven discoveries in structural biology. You will work at the intersection of scientific research and software engineering, working with researchers to train, test, and deploy ML models directly on experimental data (electron density/structure factors) coming from X-ray crystallography and cryo-EM.
This role is ideal for someone with deep experience in ML infrastructure and scientific computing who thrives in a collaborative and product-minded environment. This is a 6-month assignment with potential for extension.
Key Responsibilities:Architect, build, and maintain ML infrastructure pipelines for model training, validation, and deployment across diverse experimental datasets in collaboration with scientists
Design and manage data ingestion and preprocessing workflows for structural biology data (PDBs, cryo-EM maps, diffraction patterns, etc.) in collaboration with scientists
Develop and maintain backend services and APIs that support modular access to models, datasets, and experiment metadata
Support GPU/accelerated training on local HPC clusters or cloud platforms
Implement data versioning, model tracking, and reproducibility tools
Collaborate with ML researchers and experimentalists to streamline the integration of new algorithms, datasets, and evaluation metrics
Ability to work effectively in a multidisciplinary team environment
Strong programming skills in Python, ideally with experience in PyTorch
Deep understanding of machine learning infrastructure, including model training pipelines, GPU utilization, experiment tracking, and deployment
Proficiency in backend development (e.g., REST APIs, containerization with Docker, workflow management, and data engineering tools)
Experience with distributed compute environments
Solid understanding of scientific computing workflows, version control, and reproducibility principles
At least two years of experience working on ML models
(Bonus) Familiarity with structural biology data formats
(Bonus) Experience designing systems for diffusion-based models
W-2, Fix-term employment, 6-month assignment. Potential extension based on performance and business needs.
Location:This role is Remote, with access to our office located in Emeryville, CA. Some travel may be required from time-to-time for in-person collaboration and work.
Compensation:The posted salary range is based on location in the Bay Area. The successful candidate will receive a competitive compensation package, commensurate with their experience and location.
- Location:
- Emeryville
- Category:
- Technology
We found some similar jobs based on your search
-
New Today
Machine Learning Infrastructure Engineer
-
Emeryville
- Technology
Job Description Job Description About The Diffuse Project: The Diffuse Project is dedicated to advancing our understanding of protein motion through the use of diffuse scattering – a signal in X-ray crystallography that is currently under-ut...
More Details -
-
1 Days Old
Machine Learning Infrastructure Engineer
-
San Francisco, CA
-
$250
- Engineering
Machine Learning Infrastructure Engineer Join to apply for the Machine Learning Infrastructure Engineer role at Ambience Healthcare Machine Learning Infrastructure Engineer Join to apply for the Machine Learning Infrastructure Engineer role at ...
More Details -
-
2 Days Old
Machine Learning Infrastructure Engineer
-
San Francisco, CA, United States
-
$250,000 +
- Engineering
Machine Learning Infrastructure Engineer Join to apply for the Machine Learning Infrastructure Engineer role at Ambience Healthcare Machine Learning Infrastructure Engineer Join to apply for the Machine Learning Infrastructure Engineer role at...
More Details -
-
2 Days Old
Machine Learning Infrastructure Engineer
-
San Francisco, CA, United States
- Computer And Mathematical Occupations
Machine Learning Infrastructure Engineer Join to apply for the Machine Learning Infrastructure Engineer role at Character.AI Machine Learning Infrastructure Engineer Join to apply for the Machine Learning Infrastructure Engineer role at Charac...
More Details -
-
3 Days Old
Senior Software Engineer, Machine Learning Infrastructure
-
San Francisco, CA, United States
- Computer And Mathematical Occupations
Kodiak Robotics, Inc. was founded in 2018 and has become a leader in autonomous ground transportation committed to a safer and more efficient future for all. The company has developed an artificial intelligence (AI) powered technology stack purpose-b...
More Details -
-
10 Days Old
Machine Learning Engineer - Infrastructure
-
San Francisco, CA, United States
- Computer And Mathematical Occupations
#Team Nextdoor Nextdoor (NYSE: NXDR) is the essential neighborhood network. Neighbors, public agencies, and businesses use Nextdoor to connect around local information that matters in more than 340,000 neighborhoods across 11 countries. Nextdoor bu...
More Details -