Cohort Program
RAG Engineering in Production: Architecture, Scale, and Performance
Six weeks to a production RAG system that holds up under real enterprise workloads
About this cohort program
Naive RAG works in demos. It breaks in production. If you've built a basic retrieval pipeline and know it isn't good enough, with wrong chunks retrieved, hallucinations you can't debug, and latency that kills UX, this course gives you the architecture that actually holds up at scale.
What you will learn
- Implement hybrid BM25 and dense vector search with a reranking layer that dramatically improves precision
- Design chunking strategies, metadata schemas, and indexing pipelines for complex, mixed-content document sets
- Build a full evaluation harness that measures retrieval recall, answer faithfulness, and response quality
- Deploy a production RAG system on pgvector with query caching, latency monitoring, and graceful degradation
Who this is for
- Engineers who have built a basic RAG pipeline and know it falls short in real production usage
- AI architects designing knowledge retrieval systems for enterprise or high-stakes workloads
- Teams building internal AI tools over large proprietary document sets who need retrieval that actually works
By the end
Before
Basic retrieval that surfaces the wrong chunks
After
Hybrid search with reranking that consistently finds the right context
Before
RAG hallucinations you cannot diagnose
After
An evaluation harness that scores retrieval and answer quality automatically
Before
Latency spikes that kill user experience
After
Query caching and graceful degradation built into the production stack
Syllabus
Session 1 · Wednesday, July 8, 2026
Why Naive RAG Fails: Chunking Strategies and Embedding Models
Session 2 · Wednesday, July 15, 2026
Vector Databases, Sparse Retrieval, and Dense Retrieval
Session 3 · Wednesday, July 22, 2026
Hybrid Search, Cross-Encoder Reranking, and Metadata Filtering
Session 4 · Wednesday, July 29, 2026
Query Transformation, Multi-Vector Indexing, and Parent-Child Chunking
Session 5 · Wednesday, August 5, 2026
Building the Evaluation Harness and Agentic RAG
Session 6 · Wednesday, August 12, 2026
Caching, Multi-Tenant Architecture, Production Deployment, and Capstone
About Nikolai
Nikolai Petrov
AI Infrastructure Architect & LlamaIndex Contributor
Vetted by Maram
Nikolai has designed and deployed retrieval-augmented generation systems for enterprise clients in banking, legal, and healthcare since 2021. A core contributor to LlamaIndex, he runs the Production RAG open-source repository and publishes a weekly technical deep-dive on retrieval architecture read by 30,000 engineers.
View full profile →What learners say
Reviews appear here once 3 learners have completed this session.
