Context Bloat
Context bloat is the degradation in model output that happens when too much raw or irrelevant data is stuffed into the context window, drowning the signal in noise.
Context bloat is what happens when teams confuse capacity with quality. The window will accept whatever you put in it. That does not mean the model will use it well. As irrelevant tokens accumulate, attention gets diluted, retrieval precision drops, and the model starts pulling from weaker signals. The output looks plausible and is quietly wrong more often.
The symptoms are familiar. Answers drift off topic. Citations point to the wrong source. The model contradicts itself between turns. Teams respond by swapping in a bigger model or a bigger window, which usually makes the problem worse because it gives them more space to repeat the same mistake at a larger scale.
The actual fix is upstream. Structure the source data before it ever reaches the model. Retrieve selectively. Cite every claim. Prune aggressively. The best context is not the most context. It is the least context that covers the task.
The Amdahl view
Context bloat is why 'just connect your tools to Claude' does not work for GTM. The fix is not a bigger window or a better model. The fix is structuring the source data into an ontology before it ever hits the model. Every team Amdahl talks to who complains about unreliable AI output is suffering from context bloat. They keep adding sources. We keep telling them to add structure instead.
Frequently asked
Related terms
- AI InfrastructureContext EngineeringContext engineering is the discipline of deciding what a language model should know at inference time, including the source data, structure, and ordering of its working memory.
- AI InfrastructureContext WindowA context window is the maximum number of tokens a language model can process in a single request, covering both the input prompt and the generated output.
- AI InfrastructureOntologyAn ontology is a structured map of the concepts, entities, and relationships in a domain, used to give a language model a consistent vocabulary and schema for reasoning about source data.
- AI InfrastructureRetrieval Augmented Generation (RAG)Retrieval Augmented Generation (RAG) is a pattern where a system retrieves relevant documents from an external source, injects them into the model's prompt, and has the model answer from the retrieved material rather than from parametric memory.
- AI InfrastructureHallucinationA hallucination is output from a language model that looks plausible and fluent but is factually incorrect, unsupported by source material, or fabricated entirely.
See customer intelligence running on your own customer conversations.