Context Window

The maximum number of tokens an LLM can process in a single prompt + response.

What is Context Window?

Every LLM has a fixed context window — the maximum number of tokens (roughly: word fragments) it can handle at once. GPT-4o has 128K tokens (~96,000 words). Claude Opus 4 has 200K standard, up to 1 million via the 1M-context tier. Gemini 1.5 Pro has 2 million.

When your prompt + expected response exceeds the context window, the model either errors or silently truncates. For RAG systems, this is the primary constraint on how many retrieved chunks you can include.

Bigger context is not always better. Models can suffer from "lost in the middle" — degraded recall of content placed deep in long contexts. Most production systems still chunk + retrieve rather than dumping everything into a 1M context.

Why this matters

Context window directly affects what your Gen AI product can do. Engineering decisions (chunking strategy, RAG vs full-doc, prompt caching) all hinge on it.

Real-world example (India)

A Hyderabad financial-research firm puts 800-page Indian regulatory filings into Claude's 200K context to answer compliance questions. The same task with a 32K-context model required complex RAG plumbing.

Related terms

Want to master this?

Learn Context Window in a structured cohort

3-month live program with mentors, real projects, and 50+ partner placement support.

View the program →

← All glossary termsLLM & Gen AI