Machine Learning Engineer, Efficiency Engineering

Machine Learning Engineer, Efficiency Engineering - USDS

New Today

The Efficiency Engineering team is all about our passion for crafting innovative tools and applications that empower IT operations and devops teams to achieve new levels of efficiency. We're a tight-knit crew of experienced developers, engineers and problem solvers fueled by a shared vision: streamlining operations, reducing manual workload, and empowering teams to do their best work. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time. Responsibilities: - Responsible for the design and development of large-scale ML system architecture such as solving technical system problems on high concurrency, reliability, scalability, etc - Develop end-to-end solutions on deep model inference for internal business units such as Search and relevant Large Language Model (LLM) based systems etc - Provide highly automated and extremely performant model optimization solutions for frameworks such as PyTorch and TensorFlow. Some technical solutions includes subgraph matching, compilation optimization, model quantization, heterogeneous hardware, etc. - Manage the large-scale GPU computing power cluster for our global businesses by improving utilization rates of the computing power through methods such as elastic scheduling, GPU overselling, and task orchestration; - Engage in cross functional collaboration with the algorithm department to conduct joint optimization of algorithms and systems.

Minimum Qualifications: - B. Sc or higher degree in Computer Science or related fields from accredited and reputable institutions. - Proficient in C/C++/Python, and have solid programming skills. - Familiar with deep learning frameworks (TensorFlow/Pytorch). - Experience in developing and deploying large-scale systems. - Good communication and teamwork skills to clearly communicate technical concepts with other teammates. - Experience on improving core machine learning infrastructure(TensorFlow, Pytorch, and Jax). - 4+ years of industry experience with solid theoretical foundation of machine learning. Preferred Qualifications: - Experience in designing large scale LLM powered applications. - Agile, quick self learner, highly self-motivated with strong sense of product ownership and creative problem solver - Deeply passionate about software coding/development and building great web applications - Ability to perform independent research to solve complex technical problems - Good collaborator and team player, comfortable working in a fast moving, culturally diverse and globally distributed team environment - Passionate about techniques and solving challenging problems. - Experience of driving collaboration across cross-functional teams on delivering shared goals. - Strong communication and teamwork skills. Candidates for this position must be legally authorized to work in the United States. This position is not eligible for visa sponsorship or support.

Apply

Location:: San Jose

Start a New Search