Cohort Program

RAG Engineering in Production: Architecture, Scale, and Performance

Six weeks to a production RAG system that holds up under real enterprise workloads

About this cohort program

Naive RAG works in demos. It breaks in production. If you've built a basic retrieval pipeline and know it isn't good enough, with wrong chunks retrieved, hallucinations you can't debug, and latency that kills UX, this course gives you the architecture that actually holds up at scale.

What you will learn

Implement hybrid BM25 and dense vector search with a reranking layer that dramatically improves precision
Design chunking strategies, metadata schemas, and indexing pipelines for complex, mixed-content document sets
Build a full evaluation harness that measures retrieval recall, answer faithfulness, and response quality
Deploy a production RAG system on pgvector with query caching, latency monitoring, and graceful degradation

Who this is for

Engineers who have built a basic RAG pipeline and know it falls short in real production usage
AI architects designing knowledge retrieval systems for enterprise or high-stakes workloads
Teams building internal AI tools over large proprietary document sets who need retrieval that actually works

By the end

Before

Basic retrieval that surfaces the wrong chunks

After

Hybrid search with reranking that consistently finds the right context

Before

RAG hallucinations you cannot diagnose

After

An evaluation harness that scores retrieval and answer quality automatically

Before

Latency spikes that kill user experience

After

Query caching and graceful degradation built into the production stack

Syllabus

Session 1 · Wednesday, July 8, 2026
Why Naive RAG Fails: Chunking Strategies and Embedding Models
Session 2 · Wednesday, July 15, 2026
Vector Databases, Sparse Retrieval, and Dense Retrieval
Session 3 · Wednesday, July 22, 2026
Hybrid Search, Cross-Encoder Reranking, and Metadata Filtering
Session 4 · Wednesday, July 29, 2026
Query Transformation, Multi-Vector Indexing, and Parent-Child Chunking
Session 5 · Wednesday, August 5, 2026
Building the Evaluation Harness and Agentic RAG
Session 6 · Wednesday, August 12, 2026
Caching, Multi-Tenant Architecture, Production Deployment, and Capstone

About Nikolai

Nikolai Petrov

AI Infrastructure Architect & LlamaIndex Contributor

Vetted by Maram

Nikolai has designed and deployed retrieval-augmented generation systems for enterprise clients in banking, legal, and healthcare since 2021. A core contributor to LlamaIndex, he runs the Production RAG open-source repository and publishes a weekly technical deep-dive on retrieval architecture read by 30,000 engineers.

View full profile →

What learners say

Reviews appear here once 3 learners have completed this session.