What is the difference between RAG and fine-tuning?

Fine-tuning bakes knowledge into the model's parameters and is hard to update. RAG keeps the knowledge in an external store and looks it up at inference time, which makes it cheap to update and auditable. For most GTM use cases RAG wins because the source data changes daily and you need to trace every answer back to a source.

Does RAG eliminate hallucination?

It reduces hallucination significantly for questions that are well covered by the retrieved material. It does not eliminate it. The model can still confabulate when retrieval fails, when the retrieved chunks are contradictory, or when the question is out of scope. Citation tracking is what turns 'reduced hallucination' into 'verifiable output'.

What are the most common RAG failure modes?

Poor chunking (chunks too big or too small), weak retrieval (missing the relevant chunk in the top-k), no reranking, no citation tracking, and context bloat from stuffing too many chunks into the window. Most production failures trace to one of these five causes, not to the underlying model.

AI Infrastructure

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is a pattern where a system retrieves relevant documents from an external source, injects them into the model's prompt, and has the model answer from the retrieved material rather than from parametric memory.

RAG was the first widely adopted answer to hallucination. Instead of asking a model what it remembers about a topic, retrieve the source material and ask the model to answer from that material. The retrieved chunks become the model's grounding. This works because the model's job shifts from recall to reading comprehension, which it is much better at.

The typical stack involves chunking source documents, embedding the chunks into vectors, storing them in a vector database, embedding the user query at request time, pulling the top-k nearest chunks, and concatenating them into the prompt. Modern RAG systems add hybrid search (vector plus keyword), reranking, query expansion, and citation tracking.

RAG solved the problem of grounding models in static corpora. It did not solve context engineering. It does not give you an ontology. It does not give you a citation graph. It does not tell the model which of the retrieved chunks is most trustworthy. It is one layer of a working system, not the whole thing.

The Amdahl view

RAG is table stakes. Claiming to have RAG in 2026 is like claiming to have a website in 2006. The interesting question is what sits on top of the retrieval. The teams that win have an ontology above the retrieval layer, a citation graph beside it, and a feedback loop underneath it. Anyone selling 'AI grounded in your data' without those three pieces is selling the same RAG demo that every other vendor is shipping.

Frequently asked

Related terms

See customer intelligence running on your own customer conversations.

Book a demo