Subscribe for more posts like this →

RAG Doesn't Fix Hallucination. Neither Does Anything Else

There is a particular kind of intellectual dishonesty that the AI industry has normalised — dressing up workarounds as solutions, and calling incremental noise reduction "solving hallucination."

I want to be precise about why this bothers me. Not because the work is useless. Some of it is useful. But because the framing is wrong, and wrong framing at scale produces an entire generation of practitioners who mistake the map for the territory.

Let's start with the foundational claim most people refuse to say out loud.


Hallucination Is an Architecture Consequence, Not a Defect

An autoregressive transformer is optimised to produce the most plausible next token. Not the most true next token. Truth is not in the objective function. Plausibility is.

In the training distribution, plausibility and truth correlate well enough that the model appears to "know things." At the edges of the distribution — novel queries, rare facts, compositional reasoning, domain-specific precision — plausibility and truth diverge. The model continues producing high-confidence, fluent output regardless. Because that is what it was trained to do.

This is not a bug someone can patch. It is a direct consequence of what the system is optimising for.

Once you internalise that, a large fraction of the "hallucination reduction" industry becomes immediately suspicious. Because most of it does not touch the objective function. It works around the edges of a probability distribution and reports the result as a structural fix.

Let me go through the four most commonly cited approaches and apply some actual pressure.


RAG

The honest value envelope of Retrieval-Augmented Generation:

  • Facts past the training cutoff
  • Private or proprietary data not in the training corpus
  • Narrowing context to highly relevant material

That is it. That is the complete list of problems RAG structurally addresses.

The argument being implicitly made when people say "RAG reduces hallucination" is: the current training corpus — which is essentially all publicly available human knowledge plus private data channels — was not sufficient, but plugging in an API retrieval layer over a slice of that same corpus solves the problem.

When you state it that way, the claim weakens considerably.

Worse: RAG does not change the inference mechanism. The model can hallucinate about the retrieved content — misinterpret it, blend it with prior beliefs, attend selectively to the parts that confirm what it was already going to say. You have handed a confabulation engine better source material and hoped it behaves differently.

What actually happens when RAG "reduces hallucination" is this: it reduces a specific, narrow subclass of hallucinations caused by absent or stale facts. That is a real and useful thing. But it is not hallucination reduction — it is hallucination relocation. The structural problem is identical.

And if your retrieval is imprecise, you have not narrowed the problem. You have injected noise into the context window and made things worse.


LLM + Formal Verifier

The canonical version of this argument: generate code with an LLM, run the type checker and test suite, use the result as a signal. The verifier is deterministic, external, and provides a truth anchor.

Except — type checkers and test suites have existed for forty years. They have not solved software engineering. Why?

Because formal tools verify conformance to specification, not correctness of the specification itself. A well-typed, fully green test suite can be a precise, deterministic description of a wrong system.

Your tests tell you what your system agrees with. They do not tell you what is true. The space of possible inputs is infinite; tests are a finite sample over that space. They are a bet, formalised. Rice's Theorem tells you that no non-trivial semantic property of a program is decidable in general. Tests do not escape that.

So the actual claim of LLM + verifier is: does the generated code satisfy the tests the human wrote? That is a much weaker claim than "is the generated code correct?" The verifier does not introduce truth — it introduces a formalised opinion, which is marginally better than an unformalized one, but not categorically different.

There is one real but narrow benefit: type systems constrain the output space. If the LLM must produce code that type-checks, you have eliminated structurally incoherent programs. That is a filter over the probability distribution, not a truth oracle. Useful. Not a solution.


LLM + Symbolic Systems

This one collapses under a single question: if the symbolic system has ground truth, why route through a probabilistic model at all?

Databases, constraint solvers, and math engines are directly queryable. They return deterministic results. In cases where they can answer the question directly, inserting an LLM between the user and the oracle does one thing: it adds a probabilistic translation layer to a deterministic system. You have introduced a new failure mode where none existed before.

The LLM is doing one of two things in this loop:

Natural language to formal query translation. This is a genuinely hard problem — mapping ambiguous human intent to precise formal queries. But the LLM can mis-translate. Text-to-SQL with schema validation is the honest version of this: useful, scoped, with the generated query validated before execution. Even then, the LLM can produce syntactically valid SQL that is semantically wrong and returns a confident, incorrect result.

Handling underspecified problems. When the symbolic system cannot handle ambiguity, the LLM resolves it. But the LLM resolves ambiguity arbitrarily — by sampling from a probability distribution, not by accessing ground truth. You have not grounded the ambiguity. You have hidden it.

The honest architecture: LLM as a structured query generator for well-defined, narrow domains, with the generated artefact validated before execution. Not "LLM reasoning over a truth engine." The reasoning is not happening in the truth engine.


The P vs NP Analogy

This is the argument I find most intellectually interesting to dismantle, because it sounds the most rigorous.

The intuition: in formal systems, verification has structurally lower computational complexity than generation. Sorting is O(n log n), verifying sorted is O(n). Finding a proof is intractable; checking a proof is polynomial. The argument implies that an LLM may be reliably better at verifying answers than generating them, even if it makes errors in generation.

The analogy does not transfer.

In formal systems, verification is structurally cheaper because verification is a different computational problem with different algorithmic properties. The P vs NP asymmetry is about the nature of the problem, not the capability of the solver.

For an LLM, "verification" is not a different computation. It is next-token prediction over the prompt "is this correct?" against the same training distribution that produced the original answer. The model has a systematic bias toward confirming confident-sounding text — because in human-labelled training data, confident text correlates with being marked correct. The same probability distribution that generated a plausible-but-wrong answer will evaluate that answer favourably.

Empirically, this shows up clearly: LLMs tend to agree with wrong answers presented confidently. The verification step is not structurally cheaper or more accurate — it is another sampling event from the same distribution. The analogy borrows the prestige of computational theory without the structural properties that give that theory its force.


The Layer Issue

Most of these approaches operate at Layer 1: Syntax — surface-level plausibility, token prediction, structural conformance. Some reach into Layer 2: Behaviour — observable patterns, input-output consistency, test conformance.

Hallucination is a Layer 3: Intent problem. It is about semantic correctness — whether the generated output is true, not just plausible or structurally consistent. And you cannot fix a Layer 3 problem with Layer 1 tooling. Adding more LLM to the loop, or adding deterministic tools that measure Layer 1/2 properties, does not close the gap to Layer 3.

This is why the "100x better Claude" LinkedIn content bothers practitioners who think carefully about this. What those people observe is a reduction in hallucination frequency on their specific use case, and they generalise it to a structural claim. The underlying inference mechanism has not changed. They have found a prompt that samples more favourably from the same distribution on a narrow task. That is useful. It is not what the claim implies.


Conclusion

There is no clean solution to hallucination within the current architectural paradigm — because the paradigm optimises for plausibility, and we keep asking it to produce truth.

Every mitigation technique that doesn't change the objective function is working the edges. Some of those mitigations are genuinely useful for specific, scoped problems. But calling them "hallucination solutions" conflates frequency reduction with structural elimination. At scale, that conflation produces systems that practitioners believe are reliable when they are not — and trust failures in production environments where reliability actually matters.

The first step toward building reliable systems on top of LLMs is being honest about what LLMs are: extremely capable, extremely fluent, probabilistic text synthesisers. Not knowledge engines. Not reasoning systems.

Read more