Context Window
A context window is the maximum number of tokens a language model can process in a single request, covering both the input prompt and the generated output.
The context window is the model's working memory for a given inference call. Everything the model reasons over has to fit inside it. System instructions, retrieved documents, conversation history, tool definitions, and the final user question all consume tokens from the same budget.
Window sizes grew fast between 2023 and 2026, from a few thousand tokens to over a million. Bigger windows sound like a fix for context limitations. They are not. Models have well-documented failure modes, including the 'lost in the middle' problem where content placed in the middle of a long input gets attended to less reliably than content at the start or end. Retrieval quality degrades as the window fills with noise.
The real constraint is not how much can fit. The real constraint is how much should fit. A well-engineered 20,000 token context almost always outperforms a lazily stuffed 500,000 token context on the same task.
The Amdahl view
Treating the context window as free space is the fastest way to get worse model output. The teams that win treat it as expensive and spend it carefully. Every token in the window should earn its place. Dumping raw CRM exports and call transcripts into a million token window is not context engineering. It is a signal to noise disaster wearing a new coat.
Frequently asked
Related terms
- AI InfrastructureContext EngineeringContext engineering is the discipline of deciding what a language model should know at inference time, including the source data, structure, and ordering of its working memory.
- AI InfrastructureContext BloatContext bloat is the degradation in model output that happens when too much raw or irrelevant data is stuffed into the context window, drowning the signal in noise.
- AI InfrastructureRetrieval Augmented Generation (RAG)Retrieval Augmented Generation (RAG) is a pattern where a system retrieves relevant documents from an external source, injects them into the model's prompt, and has the model answer from the retrieved material rather than from parametric memory.
- AI InfrastructureHallucinationA hallucination is output from a language model that looks plausible and fluent but is factually incorrect, unsupported by source material, or fabricated entirely.
See customer intelligence running on your own customer conversations.