The Human Blind Spot Around Non-Deterministic Machines

Why LLM’s will always make mistakes and we shouldn’t call them hallucinations

I saw a tweet from Paul Graham a while back about how as LLMs become better their hallucinations will become more convincing. And it makes sense, a smart confident person saying something wrong often sounds more reliable than a less confident person saying the right thing timidly. Even more so, as you get good answers from the smart confident person you become more trustworthy and are less likely to question and double-check their future answers. That’s both a reality of and defect of human thinking.

But it also got me thinking about why we are using the term hallucination. It’s basically a way of saying wrong without saying wrong. I believe words matter and common understanding of words is important when communicating. So while I get why an AI researcher might have wanted to differentiate between an LLM being “wrong” (2 + 2 = 5) versus “hallucinating” (“there are trees on the moon”) that differentiation is lost on most people, and fairly meaningless in actual use (both those answers are equally wrong, even if the wrongness comes from different mechanisms.)

We understand hallucination, in humans, as being something fabricated because of a brain issue. And medical professionals work to help the individual hallucinating stop hallucinating. We don’t think of hallucinations as making mistakes. If you’ve ever known someone having a hallucination you can usually tell. A person doesn’t usually randomly hallucinate a factual error while otherwise giving a coherent well-reasoned response. The average person hears “hallucinate” and has a very concrete idea of what that would look like, and AI LLMs making mistakes doesn’t look like a hallucination.

But we don’t expect any individual to be perfect. In fact, we expect humans to be imperfect, mistakes are part of the human experience. Humans are non-deterministic creatures that while predictable and generally reliable are also prone to making all types of mistakes. And when a person makes an error we don’t seriously say they hallucinated, we say they were wrong.

We’ve always had a weird relationship with machines. We want them to be consistent, logical, and predictable: that’s the whole point, right? People are messy and forgetful. Machines are supposed to be exact and reliable. For my whole life I’ve been saying well if you just program a computer to do it than you don’t have to rely on people messing it up. Computers can have bugs, but we fix the bugs and the problem goes away. Humans make mistakes, and we correct them but even the best human will continue to make mistakes at time (something about to err is human!)

This made total sense for decades. Computers ran the same program and got the same result every time. Ask a calculator for 2+2 and you’ll always get 4. Query a database for users named “Smith” and you’ll get the exact same list. A computer with the same programming on the same data gives you the same output every time, I’ve written a lot of software tests in my life – this is how it works.

But now what if that isn’t how it works? What if when you run the query the outputs may be inconsistent. What if you can’t reliably ensure it won’t get it wrong sometimes? This will mean that the way we need to understand computers and software will have to change. And a lot of people are and will have a a hard time dealing with it, and this will lead them to ask how do can we make this new paradigm look like the old paradigm.

Of course, we can’t make the new look like the old. Large Language Models have pushed machines into completely new territory. They’re truly non-deterministic. They can give different answers to the same question. And this freaks people out. They don’t get it, and they definitely don’t trust it. But pretending as if this is a bug not a feature will take us down the wrong path.

Here’s the thing: non-deterministic systems aren’t broken. They’re actually more like how humans think. But because we keep expecting machines to work like calculators, we see variability as a flaw instead of a feature. If we want to use LLMs well, we need to stop thinking about them the old way and accept that some inconsistency and errors are totally normal.

Non-Deterministic Doesn’t Mean Random

The biggest confusion? People think non-determinism equals randomness. “Why would a machine give me different answers? It must be broken.”

Wrong. Just because an LLM varies its responses doesn’t make it random, it makes it non-deterministic like a human brain.

LLMs work by calculating probabilities for what outputs should come next, based on context. It’s actually highly structured. It’s driven by training data, prompt phrasing, temperature settings, and more. Like humans, they consider options and pick responses based on what’s likely, not what’s guaranteed.

This is statistical reasoning, not chaos. Ask a human an open question and you’ll get slightly different answers depending on their mood or what they’re thinking about. LLMs are basically the same deal.

We already accept this in other tech. Search results change based on your location or the time of day. Nobody panics. But when ChatGPT gives a slightly different answer? People lose their minds. Part of the problem is that LLMs present themselves as confident experts, not personalized search tools, so the variability feels wrong. They usually return a result – often a written narrative – not a list of potential options (although this can be guided by prompting), like a Google search. And humans are lazy, we didn’t want to search through a list of results and have to figure out the answer to our question, we just wanted the answer and if someone offers us one we’ll likely take it – especially if we don’t have a reason to distrust the source.

The Hallucination Problem and Why It’s a Terrible Name

Everyone talks about LLM “hallucinations” – when the model confidently makes stuff up. People hate this because it feels fundamentally wrong: the machine isn’t just making an error, it’s creating plausible-sounding fiction.

But hold on. Humans do this constantly. We give wrong answers with total confidence. We misremember things. We mix up facts. But we don’t call it “hallucinating.” We just say someone was wrong.

The word “hallucination” is the problem. It makes LLM errors sound alien and bizarre when they’re actually doing exactly what any pattern-matching system does: occasionally connecting dots that shouldn’t be connected. You’ve done this, I’ve done this, we’ve all done this at time…connected dots that resulted in a conclusion that was just plain wrong.

Better to think of LLMs as fallible assistants. Just like we plan for human error with double-checking and reviews, we should plan for LLM errors. When we hire people, we don’t expect perfection – we have processes for validation and oversight. We base those on the criticality of the work being done. Same principle applies here.

Expectation Mismatch: Machines Aren’t Always Calculators

The core issue? We’re stuck thinking machines should work like calculators – perfect, precise, repeatable. When a calculator gets math wrong, it’s broken. But LLMs aren’t calculators. They’re more like driven new employees: quick, helpful, usually right, but definitely not infallible. Anyone who has hired recent college grads will know they type, they can be incredibly helpful but they still need a lot of supervision, validation, and guidance.

We’re actually pretty good at accepting uncertainty in some areas. Nobody expects two doctors to give identical diagnoses. Weather forecasts change and we shrug. Poker involves luck (duh!) But when our AI assistant makes something up? Suddenly it’s a crisis.

The problem is framing. If people understood LLMs as probabilistic thinking tools instead of truth machines, the errors would feel normal. We haven’t taught users to interact with these systems like they would with a smart but imperfect human colleague, but this exactly how we should think about an LLM and how we should build systems that utilize any non-deterministic inputs.

Designing for Fallibility: Lessons from Human Systems

Look at how we handle human decision-making in critical fields. We never assume people will be perfect. Instead we use:

  • Checklists (pilots, surgeons)
  • Peer review (scientists, researchers)
  • Approval chains (business, government)
  • Appeals and audits (courts)

These systems work because they assume humans will screw up sometimes while keeping things mostly reliable.

Systems that will depend on AI need to take the same approach:

  • Verification layers: Check LLM output against reliable sources
  • Human oversight: Keep humans in charge of final decisions
  • Smart constraints: Control when and how LLMs can answer
  • Honest warnings: Tell users when the AI might be guessing

Once you accept LLMs are probabilistic, the goal changes. We should stop trying to force perfect accuracy because no matter how much we reduce errors we are never going to get to zero. Thus, we must be building robust processes around naturally imperfect reasoning tools.

Moving Forward: Adapting Our Thinking

We need to change how we talk about LLMs. The current obsession with “hallucinations” and errors misses the point: we’re using non-deterministic reasoning systems in a world built for deterministic machines. That’s why people are confused and skeptical.

When you understand what LLMs actually are, they become incredibly useful—not because they’re perfect, but because they’re flexible, creative, and fast. The trick is setting proper expectations, building in verification, and staying within their limits.

We don’t demand perfection from humans. We can’t expect it from LLMs. What we need to accept is predictable imperfection, and then build our systems accordingly.

The bottom line: LLMs making mistakes isn’t the problem. Pretending they won’t is.

So I finish with this: once we accept fallibility as part of complex reasoning – not a bug to be fixed – we’ll use these tools much more effectively. Just like we’ve learned to work with flawed but brilliant (and not so brilliant) humans. And this means we are going to have to teach, coach, train, and develop people on how to think about computers that don’t always give the same answer to the same question.

This is cross-posted at https://www.linkedin.com/pulse/human-blind-spot-around-non-deterministic-machines-nicolas-nowinski-sp1le.

nicknow96:
Related Post