Data Engineer III (Vector Database Specialist)

12 Days Old

Salary: Data Engineer III (Vector Database Specialist)
About Catalytic Data Science (CDS): Catalytic Data Science is a fast-growing SaaS company building cutting-edge, AI-driven solutions for regulatory affairs professionals shaping innovation in life sciences. Our engineering team leverages generative AI to extract insights from complex, unstructured data at scale. We believe in clean code, collaborative problem-solving, and a culture where engineers have a direct impact on meaningful products used by global life sciences organizations. Our customers are passionate about making the world a better place, and we are inspired by the opportunity to help them. If you are passionate about solving technical challenges that improve medical innovation and regulatory processes, youll find your next home with us.
Who you Are: You are a hands-on data engineer experienced in building robust data pipelines, designing scalable storage solutions, and enabling high-performance data retrievalespecially with modern vector databases and cloud-based services. You excel at optimizing flows for large-scale, often unstructured datasets, and understand how to balance efficiency, reliability, and security in your solutions.
What Youll Do: Architect and manage pipelines for ingestion, transformation, and indexing of unstructured/structured data into vector databases (e.g., Pinecone, FAISS, Weaviate). Optimize vector search and retrieval solutions for scalability and performance. Design data models in support of LLM/RAG applications. Collaborate with ML, NLP, and backend teams to ensure robust end-to-end data flows. Maintain data quality, lineage, and compliance standards.
Qualifications: Bachelor's degree or higher in computer science, engineering, or a related field. 5+ years in data engineering; 2+ with vector databases and embeddings. Deep understanding of modern vector database solutions and search architectures. Proficient in Python, SQL, and relevant ETL/ELT frameworks (Airflow, dbt, etc.). Experience with AWS data services. Experience leveraging AI-powered coding assistants (e.g., GitHub Copilot, Copilot X, ChatGPT Code Interpreter, Amazon CodeWhisperer) to enhance productivity in day-to-day software development activities, including code generation, refactoring, and documentation. Familiarity with best practices for integrating AI coding assistants into team workflows while maintaining code quality, security, and regulatory compliance. Knowledge of Document AI and LLM pipelines is a plus.
In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification document form upon hire.
remote work
Location:
Weston
Category:
Technology