AI Infrastructure

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is a pattern where a system retrieves relevant documents from an external source, injects them into the model's prompt, and has the model answer from the retrieved material rather than from parametric memory.

RAG was the first widely adopted answer to hallucination. Instead of asking a model what it remembers about a topic, retrieve the source material and ask the model to answer from that material. The retrieved chunks become the model's grounding. This works because the model's job shifts from recall to reading comprehension, which it is much better at.

The typical stack involves chunking source documents, embedding the chunks into vectors, storing them in a vector database, embedding the user query at request time, pulling the top-k nearest chunks, and concatenating them into the prompt. Modern RAG systems add hybrid search (vector plus keyword), reranking, query expansion, and citation tracking.

RAG solved the problem of grounding models in static corpora. It did not solve context engineering. It does not give you an ontology. It does not give you a citation graph. It does not tell the model which of the retrieved chunks is most trustworthy. It is one layer of a working system, not the whole thing.

The Amdahl view

RAG is table stakes. Claiming to have RAG in 2026 is like claiming to have a website in 2006. The interesting question is what sits on top of the retrieval. The teams that win have an ontology above the retrieval layer, a citation graph beside it, and a feedback loop underneath it. Anyone selling 'AI grounded in your data' without those three pieces is selling the same RAG demo that every other vendor is shipping.

See customer intelligence running on your own customer conversations.