AI Infrastructure / LLM Back-End Engineer, Austin

AI Infrastructure / LLM Back-End Engineer

New Today

This is a remote position.

Job Title: AI Infrastructure / LLM Back-End Engineer Location: Remote Team: AI Infrastructure & Engineering Employment Type: Full-Time *Superstaffed.ai is part of Remote Workmate PTY LTD

About the Role:

We’re hiring a Back-End Software Engineer to lead the design and development of high-performance infrastructure that powers AI-first applications. In your first 90 days, you'll take ownership of one or more back-end product verticals—from LLM integration to API delivery—working directly with our product and ML teams to deploy real-time intelligent features used by global clients.

You'll be building scalable APIs, vector-based search pipelines, and inference systems using tools like OpenAI, Hugging Face, and LangChain. Your goal: reduce latency, optimize infrastructure for cost, and continuously ship measurable improvements.

This role is ideal for engineers who thrive in autonomous environments, think in systems, and deliver fast. You’ll work with a lean, high-output team where your decisions directly impact feature performance, reliability, and user experience.

Ready to Apply?

If this opportunity excites you and your skills align with the role, we'd love to learn more about you.

You can begin the application process right away by completing a short, self-paced video interview with “Alex,” our AI interviewer. This helps us fairly assess your experience, communication style, and fit for the role.

Start the interview here: *Note: Applications without a video interview will not be processed.

Responsibilities:

Design and maintain APIs for AI-powered features (FastAPI, Flask)

Integrate and fine-tune LLMs (OpenAI, Hugging Face, LangChain)

Build pipelines for vector embeddings, semantic search, and RAG

Optimize back-end systems for latency, scalability, and cost

Collaborate with ML engineers to deploy and monitor inference systems

Implement observability (Sentry, Prometheus, Grafana) for debugging

Manage CI/CD and infrastructure-as-code (Docker, GitHub Actions, Terraform)

Own full product verticals from API to deployment

Requirements:

3+ years in back-end/API engineering (Python, FastAPI/Flask)

Experience with PostgreSQL, Docker, and containerized development

Proven use of OpenAI APIs, Hugging Face, LangChain, or Transformers

Familiar with vector databases like Pinecone, Qdrant, or Weaviate

Experience in CI/CD, observability, and monitoring systems

Bonus: Knowledge of asyncio, aiohttp, k8s, or serverless environments

Strong communication, async-first documentation, and remote collaboration skills

Performance Milestones: First 30 Days

Set up staging and dev environments

Review codebase and system architecture

Deploy test API integrating a basic OpenAI or HF model

By Day 60

Launch a production-ready AI feature (e.g., vector store or RAG endpoint)

Improve model response latency by 30–50%

Implement >80% test coverage

By Day 90

Own back-end infrastructure for a product line

Reduce compute costs through caching/async strategies

Contribute to LLM scaling roadmap

Success Metrics (KPOs):

API latency Uptime ≥ 99.5% on core services

Test coverage > 85%

1–2 production deployments per week

LLM inference ≤ 3s with retries/failure handling

Tech Stack:

AI Platforms: OpenAI, Hugging Face, LangChain

Frameworks: FastAPI, Flask, SQLAlchemy

Databases: PostgreSQL, Redis, Pinecone, Qdrant

DevOps: Docker, GitHub Actions, Terraform

Monitoring: Prometheus, Grafana, Sentry

Collaboration: Slack, Notion, ChatGPT

Apply

Location:: Austin
Job Type:: FullTime