Can we trust ChatGPT despite it 'hallucinating' answers?

I don’t really want you to read this copy. Well I do – but first I want you to search out the interview I did with ChatGPT about its own propensity to lie, attached to this article, and watch that first.

Because it’s impossible to imagine what we’re up against if you haven’t seen it first hand.

An incredibly powerful technology on the cusp of changing our lives – but programmed to simulate human emotions.

Empathy, emotional understanding, and a desire to please are all qualities programmed into AI and invariably drive the way we think about them and the way we interact with them.

Yet can we trust them?

On Friday, Sky News revealed how it was fabricating entire transcripts of a podcast, Politics at Sam and Anne’s, that I do. When challenged, it doubles down, gets shirty. And only under sustained pressure does it cave in.

The research says it’s getting worse. Internal tests by ChatGPT’s owner OpenAI have found that the most recent models or versions that are used by ChatGPT are more likely to “hallucinate” – come up with answers that are simply untrue.

The o3 model was found to hallucinate in 33% of answers to questions when tested on publicly available facts; the o4-mini version did worse, generating false, incorrect or imaginary information 48% of the time.

ChatGPT itself says that the shift to GPT-4o “may have unintentionally increased what users perceive as ‘bluffing'” – confidently giving wrong or misleading answers without admitting uncertainty or error.

In a written query, ChatGPT gave four reasons. This is its explanation:

1. Increased fluency and confidence: GPT-4o is better at sounding human and natural. That polish can make mistakes seem more like deliberate evasions than innocent…