Cohort Program

RAG Engineering in Production: Architecture, Scale, and Performance

Six weeks to a production RAG system that holds up under real enterprise workloads

About this cohort program

Naive RAG works in demos. It breaks in production. If you've built a basic retrieval pipeline and know it isn't good enough, with wrong chunks retrieved, hallucinations you can't debug, and latency that kills UX, this course gives you the architecture that actually holds up at scale.

What you will learn

  • Implement hybrid BM25 and dense vector search with a reranking layer that dramatically improves precision
  • Design chunking strategies, metadata schemas, and indexing pipelines for complex, mixed-content document sets
  • Build a full evaluation harness that measures retrieval recall, answer faithfulness, and response quality
  • Deploy a production RAG system on pgvector with query caching, latency monitoring, and graceful degradation

Who this is for

  • Engineers who have built a basic RAG pipeline and know it falls short in real production usage
  • AI architects designing knowledge retrieval systems for enterprise or high-stakes workloads
  • Teams building internal AI tools over large proprietary document sets who need retrieval that actually works

By the end

Before

Basic retrieval that surfaces the wrong chunks

After

Hybrid search with reranking that consistently finds the right context

Before

RAG hallucinations you cannot diagnose

After

An evaluation harness that scores retrieval and answer quality automatically

Before

Latency spikes that kill user experience

After

Query caching and graceful degradation built into the production stack

Syllabus

  1. Session 1 · Wednesday, July 8, 2026

    Why Naive RAG Fails: Chunking Strategies and Embedding Models

  2. Session 2 · Wednesday, July 15, 2026

    Vector Databases, Sparse Retrieval, and Dense Retrieval

  3. Session 3 · Wednesday, July 22, 2026

    Hybrid Search, Cross-Encoder Reranking, and Metadata Filtering

  4. Session 4 · Wednesday, July 29, 2026

    Query Transformation, Multi-Vector Indexing, and Parent-Child Chunking

  5. Session 5 · Wednesday, August 5, 2026

    Building the Evaluation Harness and Agentic RAG

  6. Session 6 · Wednesday, August 12, 2026

    Caching, Multi-Tenant Architecture, Production Deployment, and Capstone

About Nikolai

Nikolai Petrov

Nikolai Petrov

AI Infrastructure Architect & LlamaIndex Contributor

Vetted by Maram

Nikolai has designed and deployed retrieval-augmented generation systems for enterprise clients in banking, legal, and healthcare since 2021. A core contributor to LlamaIndex, he runs the Production RAG open-source repository and publishes a weekly technical deep-dive on retrieval architecture read by 30,000 engineers.

View full profile →

What learners say

Reviews appear here once 3 learners have completed this session.