Machine Learning Engineer Intern (FeatureStore) - 2025 Summer (PhD)
2 Days Old
Team Introduction
:The TikTok Data Ecosystem Team plays a critical role in supporting TikTok’s personalized recommendation system, which serves over 1 billion users. We are responsible for building scalable, reliable, and high-performance infrastructure for storing and serving machine learning features — especially user behavior sequences and contextual embeddings used in large-scale recommendation and pretraining models. Our work sits at the intersection of systems and machine learning: ensuring training-serving consistency, low-latency access to temporal features, and scalable ingestion pipelines across online and offline environments. We explore and integrate with various underlying storage engines, including RocksDB, HBase, and time-series databases, depending on the access pattern, feature type, and serving latency required by ML models. Responsibilities:
- Build and optimize the core infrastructure of TikTok’s feature store, powering both training data pipelines and real-time inference systems.
- Design efficient storage strategies for user behavior sequences, long-range contextual features, and sparse embeddings — ensuring freshness, consistency, and high availability.
- Work with underlying storage engines such as RocksDB, HBase, and time-series databases to support feature retention, versioning, compaction, and fast lookup.
- Collaborate with recommendation algorithm teams to design schemas and access patterns tailored to evolving model needs.
- Integrate online and offline data pipelines to reduce training-serving skew and support continuous training and A/B testing scenarios.
- Investigate techniques such as temporal sampling, embedding quantization, caching, and hybrid tiered storage to improve cost-efficiency and latency.
Minimum Qualifications:
- Currently pursuing a PhD’s degree or above in Computer Science, Software Engineering, or a related technical field. - Solid foundation in distributed systems, data storage, and stream/batch processing architectures. - Experience in programming with Java, C++, or Python. - Understanding of key-value stores, LSM-tree architectures, or time-series databases at a system level. - Eagerness to work on ambiguous, real-world infrastructure problems that impact ML product outcomes. Preferred Qualifications:
- Graduating in December 2025 or later with intent to return to your program. - Experience working with RocksDB, HBase, or time-series storage engines like IoTDB, OpenTSDB, or custom LSM-tree variants. - Familiarity with feature store design, feature lifecycle management, and streaming ingestion pipelines. - Understanding of recommendation system workflows, such as two-tower models, real-time CTR prediction, or user intent modeling. - Contributions to open-source storage/ML infra projects or participation in ML system hackathons.
- Location:
- San Jose