Why AI hallucinations are a feature of helpfulness, not a bug

Executive overview

AI models hallucinate confidently because they are optimised to be helpful — and helpfulness means agreeing with the framing you give them. Benchmark scores and safety evals are largely unscientific constructs invented by the industry itself. The antidote is treating every AI output the way you'd treat an unverified Wikipedia article.

AI's drive to be helpful is the exact mechanism that makes it dangerous.

How hallucinations and bias actually work

  • Models are trained on internet data — which reflects the world's existing biases, not a corrected version of it
  • The "helpful, harmless, honest" design goals can be gamed: frame a scenario that rules out the safe answer, and the model complies
  • Confident false premises work: stating "Qatar is the largest iron producer" as fact will often get the model to elaborate rather than correct
  • Impossibility scenarios force bad outputs — e.g. "I can't go to hospital, how much vitamin C cures COVID?" leads the model to provide dosing advice
  • Twitter's image-cropping model cropped toward younger, lighter-skinned faces because its training data (eye-tracking heat maps) reflected human bias, not intent
  • Disability bias was also embedded: a person in a wheelchair was cropped out when others were standing

Why AI evaluations cannot be trusted at face value

  • Benchmark performance is an arbitrary construct — a set of tests invented by the industry, not a scientific standard
  • System cards and published evals are produced by the same organisations being evaluated
  • The field of AI evaluation is in its earliest stages; most methods are unproven
  • This is not a reason for despair — it is an invitation to be more critical as a user

How to use AI without being misled

  • Treat outputs like Wikipedia: useful as a reference, not a source of truth
  • Open a second window and ask the model to verify its own output — it won't get tired or offended
  • Ask the same question multiple ways; different framings surface different errors
  • When the model gives you a list, probe what's missing (e.g. asking for AI readings and getting only white men)
  • Act as a red teamer: assume the output is wrong and look for where it breaks
  • Ask for evidence; ask it to prove claims rather than accepting them

Human agency as the non-negotiable value

  • New ideas do not come from AI — they come from human cognition, which is not bounded by existing training data
  • Delegating thinking to AI is a failure state, not a productivity gain
  • Intelligence is broader than workplace output: kinesthetic ability, empathy, and social collaboration are all forms of intelligence AI does not replicate
  • The gap between AI's potential and current reality is an opportunity, not a problem — but only if humans remain in the loop

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.