Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

highplainsdem

(61,042 posts)
13. It's called reasoning by people working on and promoting AI, but it's really more a pretense of
Tue Feb 10, 2026, 10:25 AM
Tuesday

reasoning. Some people are still really impressed by the AI supposedly showing its reasoning, like a schoolkid showing their work solving a math problem.

But they found out nearly a year ago that the new "reasoning" AI models actually hallucinated more than older AI models that don't show their reasoning. See this thread I posted last April and the article it's about:

OpenAI's new reasoning AI models hallucinate more
https://www.democraticunderground.com/100220267171
https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/

OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the company’s in-house benchmark for measuring the accuracy of a model’s knowledge about people. That’s roughly double the hallucination rate of OpenAI’s previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA — hallucinating 48% of the time.

Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro “outside of ChatGPT,” then copied the numbers into its answer. While o3 has access to some tools, it can’t do that.


This new study took a more thorough look at the reasoning failures.

I thought the stunned and apparently scared reaction from the pro-AI account was worth posting here, especially this:

One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated.

This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process.

-snip-

The takeaway isn’t that LLMs can’t reason.

It’s more uncomfortable than that.

LLMs reason just enough to sound convincing, but not enough to be reliable.

And unless we start measuring how models fail not just how often they succeed we’ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing.


As a genAI nonbeliever, my first response to reading that had been to laugh at anyone not already understanding that.

It's been known for years that genAI models make lots of mistakes while still sounding convincing and authoritative. That's why even AI companies peddling this inherently flawed tech admit it's important to check AI answers because they're often wrong.

But people who like to use AI tend to push that warning aside, and those gullible people are even more impressed when an AI "shows its reasoning."

This new paper exposes just how foolish it is for an AI user to do that.

One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated.

This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process.


You have to be gullible to let a machine that can't reason "train" you to trust it. But it's been known for years that the more someone uses chatbots, the less likely they are to bother checking the AI results. Plus chatbots are designed to persuade and manipulate - to keep AI users engaged - and there are a lot of gullible AI users out there. Which is why we hear more and more stories about what's often called AI psychosis, where a chatbot gradually pushes a too-trusting user into delusions that can result in breakdowns and even suicide.

AI fans like to believe that isn't likely to happen to them. They also like to believe their favorite AI models really are trustworthy. This new study blows up that assumption of trustworthiness.

But because it blows up those assumptions, there will probably be a lot of AI users who will refuse to read it or believe the conclusions.

Recommendations

2 members have recommended this reply (displayed in chronological order):

Kick SheltieLover Monday #1
Thanks! highplainsdem Tuesday #16
Yw! SheltieLover Tuesday #17
Is it accepted that generative AI reasons? Iris Monday #2
Depends on the person EdmondDantes_ Tuesday #8
It's called reasoning by people working on and promoting AI, but it's really more a pretense of highplainsdem Tuesday #13
The main problem is how to assess evidence. Happy Hoosier Tuesday #23
Thank you for providing this context. Iris Thursday #30
The reasoning aspect is key. cachukis Monday #3
This message was self-deleted by its author Whiskeytide Tuesday #10
I like your Spock/Kirk analogy, but then I thought ... Whiskeytide Tuesday #11
I think Spock recognized humanity as a whole cloth. cachukis Tuesday #24
I wonder how this affects ... rog Monday #4
Whether or not an AI model shows its reasoning - its pretense of reasoning - you should never trust highplainsdem Tuesday #14
That's an issue that seems to be coming up again and again . . . hatrack Tuesday #18
With the "bonus" of dumbing yourself down, de-skilling yourself, as you try to let the AI do the work. highplainsdem Tuesday #19
Same reason I refuse to use AI when writing or researching . . . hatrack Tuesday #20
You may be missing my point ... rog Tuesday #21
Summarizing isn't something AI is good at, judging by examples I've seen. Organizing by subject or highplainsdem Tuesday #25
I just got back from an appointment with my vascular surgeon. rog Tuesday #27
The most clueless dogs I've met have better internal models of reality than any AI. hunter Tuesday #5
I've never forgotten a software engineer and machine learning expert saying an amoeba is more intelligent than an LLM. highplainsdem Tuesday #15
I wonder how Neuro-sama would do on the test sakabatou Tuesday #6
In a way, this is a computerized version of odins folly Tuesday #7
This explains why... purr-rat beauty Tuesday #9
Sam Altman is a serial liar who's fired everywhere he's been - including Open AI. 617Blue Tuesday #12
LLM's can't really reason. Happy Hoosier Tuesday #22
You are using the language of the AI promoters. hunter Tuesday #26
I work in software development. Such Anthropomorphic language is common. Happy Hoosier Tuesday #29
AI expert Gary Marcus's response to that paper: highplainsdem Tuesday #28
Latest Discussions»General Discussion»A very pro-AI account on ...»Reply #13