Senior AI/ML Engineer, San Francisco

Senior AI/ML Engineer

New Today

100% Remote

Job Title: Senior AI/ML Engineer - Large Language Model Pretraining (100B+ Parameters)

Location - West Coast 100% Remote

Role Overview We are seeking Senior AI/ML Engineers with PhDs or Master's degrees in Computer Science or related fields from top 20 universities . You will lead the pretraining of massive LLMs (100B+ parameters) , requiring deep expertise in distributed training, large-scale optimization, and model architecture. This is a rare opportunity to work with petabyte-scale datasets and cutting-edge compute clusters in a high-impact environment.

Key Responsibilities

Architect and implement large-scale training pipelines for LLMs with 100B+ parameters. Optimize distributed training performance across thousands of GPUs/TPUs. Collaborate with research scientists to translate experimental results into production-grade training runs. Manage and preprocess petabyte-scale datasets for pretraining. Implement state-of-the-art techniques in scaling laws, model parallelism, and memory optimization. Conduct rigorous benchmarking, profiling, and performance tuning. Contribute to Client research in LLM architecture, training stability, and efficiency. Required Qualifications Advanced degree (PhD or Master's) in Computer Science, Machine Learning, or related field from a top 20 global university in CS. 3+ years of hands-on experience with large-scale deep learning model training. Proven experience in pretraining models exceeding 10B parameters , preferably 100B+. Deep expertise in distributed training frameworks ( DeepSpeed, Megatron-LM, PyTorch FSDP, TensorFlow Mesh, JAX/TPU ). Proficiency with parallelism strategies (data, tensor, pipeline) and mixed precision training . Experience with large-scale cloud or HPC environments ( AWS, Azure, GCP, Slurm, Kubernetes, Ray ). Strong skills in Python , CUDA , and performance optimization. Strong publication record in top-tier ML/AI venues (NeurIPS, ICML, ICLR, ACL, etc.) preferred. Preferred Skills Experience with LLM fine-tuning (RLHF, LoRA, PEFT). Familiarity with tokenizer development and multilingual pretraining. Knowledge of scaling laws and model evaluation frameworks for massive LLMs. Hands-on work with petabyte-scale distributed storage systems .

Verify: United States Employment Opportunities Only

E-Verify is an internet-based system operated by the Department of Homeland Security and the Social Security Administration and allows employers to confirm an individual's employment eligibility to work in the United States. Under the E-Verify rules, effective September 8, 2009, federal agencies subject to the Federal Acquisition Regulation are required to modify, and include in new contracts, a provision that requires federal contractors and subcontractors to use E-Verify. ITCO Solutions is required to adhere to these requirements.

This message is intended for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

Apply

Location:: San Francisco

Start a New Search