Transformer

A neural network architecture introduced in 2017 — the foundation of every modern LLM.

What is Transformer?

The transformer is the most important neural network architecture of the last decade. Introduced by Google researchers in the paper "Attention Is All You Need" (2017), it replaced earlier sequence models (RNNs, LSTMs) for most language tasks.

The core innovation is **attention** — the model can look at any other position in the input when processing each position. This lets it handle long-range dependencies in text efficiently and parallelise training in a way RNNs cannot.

Every LLM you have heard of — GPT, Claude, Gemini, Llama — is a transformer at its core. Most non-text models (image, audio, video) in 2026 also use transformer variants.

Why this matters

Understanding transformer architecture is the difference between using LLMs and understanding them. Critical for senior ML and Gen AI roles.

Real-world example (India)

When Claude reads a 200-page contract, the transformer's attention mechanism is what lets it remember a clause on page 12 while writing about page 187. Older RNN architectures could not do this reliably.

Related terms

Want to master this?

Learn Transformer in a structured cohort

3-month live program with mentors, real projects, and 50+ partner placement support.

View the program →

← All glossary termsLLM & Gen AI