RAG (Retrieval-Augmented Generation)

A pattern where an LLM retrieves relevant documents before answering — making it factual and up-to-date.

What is RAG (Retrieval-Augmented Generation)?

Out of the box, an LLM only knows what was in its training data. RAG solves this by giving the LLM access to a searchable document store at query time. When a user asks a question, the system retrieves the most relevant documents and includes them in the LLM's prompt.

A RAG pipeline has three stages: (1) **indexing** — split documents into chunks, generate embeddings, store in a vector database. (2) **retrieval** — convert the user's question to an embedding, find the closest matching chunks. (3) **generation** — pass the chunks + question to the LLM, which produces an answer grounded in the retrieved content.

RAG is the #1 production Gen AI pattern in 2026. It is how every "chat with your docs" product works — Notion AI, Glean, customer-support bots, internal knowledge tools.

Why this matters

Building production-quality RAG is the most common task asked of Gen AI engineers in India. Companies want it for internal knowledge bases, customer support, and document analysis.

Real-world example (India)

A Pune-based legal-tech company built a RAG system over 50,000 Indian court judgements. Lawyers ask questions like "what is the precedent for software-licence disputes in Karnataka?" and get cited, accurate answers in seconds.

Related terms

Want to master this?

Learn RAG (Retrieval-Augmented Generation) in a structured cohort

3-month live program with mentors, real projects, and 50+ partner placement support.

View the program →

← All glossary termsLLM & Gen AI