Associate Data Engineer, NLP/LLM Infrastructure
14 Days Old
Press Tab to Move to Skip to Content Link
Select how often (in days) to receive an alert:
Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you’ll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent. We’re big enough to give you all the resources you need, and small enough so you can make a real difference and earn recognition for your work. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.
We are seeking a Data Engineer specializing in linguistic data infrastructure to join our team. With guidance from senior team members, this role will focus on implementing and maintaining data pipelines and storage solutions that support our Natural Language Processing (NLP) and Large Language Model (LLM) initiatives. The ideal candidate will combine data engineering fundamentals with an interest in linguistic data structures to help build scalable systems that effectively process, store, and deliver text data for AI applications. This role offers significant growth opportunities to develop expertise in linguistic data engineering and AI infrastructure.
Responsibilities
Linguistic Data Pipeline Development: Assist in designing, implementing, and maintaining ETL pipelines for harvesting, cleaning, and processing large text corpora from various sources
Text Data Infrastructure: Help build and optimize database schemas and storage solutions specifically designed for linguistic data, with a focus on efficient querying and retrieval of text patterns.
Database Management: Contribute to building and maintaining specialized text corpora for training domain-specific language models, with focus on terminology extraction and style pattern identification in aggregated databases.
Data Quality & Governance: Implement data validation processes to ensure linguistic data quality, consistency, and compliance with relevant standards and requirements.
Cross-functional Collaboration: Work with stakeholders to understand requirements and help translate them into data engineering solutions that support Language Engineering and AI R&D initiatives.
Requirements:
Bachelor's degree in Linguistics, Computer Science, Software Engineering, Data Engineering, or related technical field
Basic understanding of NLP concepts and text processing techniques
Basic to intermediate experience with Python programming for data engineering work
Exposure to prompt engineering, prompt tuning, and interacting with LLMs
Familiarity with SQL and database design principles
Knowledge of data modeling concepts for structured and unstructured data
Ability to follow plans and to collaborate with project leads to meet goals
Nice to Have:
Experience with PyTorch and neo4j
Coursework or projects building knowledge graphs
Knowledge of linguistics or computational linguistics
Experience working with large text corpora
Experience with text vectorization and embedding techniques
Dolby Hiring Entity:
#J-18808-Ljbffr
- Location:
- San Francisco, CA, United States
- Salary:
- $200,000 - $250,000
- Category:
- IT & Technology