Vietnam.vn - Nền tảng quảng bá Việt Nam

AI Chatbots Are Getting 'Mad'

A new wave of “inference” systems from companies like OpenAI is making misinformation happen more often. The danger is that companies don’t know why.

Zing NewsZing News08/05/2025

In April, an AI bot that handles technical support for Cursor, an emerging tool for programmers, notified some customers of a change in the company’s policy, specifically saying they were no longer allowed to use Cursor on more than one computer.

Customers posted their anger on forums and social media. Some even canceled their Cursor accounts. But some were even more furious when they realized what had happened: the AI ​​bot had reported a policy change that didn’t exist.

“We don’t have that policy. You can of course use Cursor on multiple machines. Unfortunately, this was an incorrect response from an AI-powered bot,” Michael Truell, the company’s CEO and co-founder, wrote in a Reddit post.

Fake information is out of control.

More than two years after ChatGPT's launch, tech companies, office workers, and everyday consumers are using AI bots for a wide range of tasks with increasing frequency.

Yet there is no way to ensure that these systems are producing accurate information. Paradoxically, the most powerful new technologies, known as “inference” systems from companies like OpenAI, Google, and DeepSeek, are making more errors.

AI anh 1

Nonsensical ChatGPT conversation where a user asks if dogs should eat cereal. Photo: Reddit.

While mathematical skills have improved dramatically, the ability of large language models (LLMs) to capture the truth has become more shaky. Surprisingly, even engineers themselves are completely clueless as to why.

According to the New York Times , today's AI chatbots rely on complex mathematical systems to learn skills by analyzing huge amounts of digital data. However, they cannot decide what is right and what is wrong.

From there, the state of "hallucination" or self-fabrication of information appears. In fact, according to research, the latest generation of LLMs even suffer from "hallucination" more than some older models.

Specifically, in the latest report, OpenAI discovered that the o3 model "hallucinated" when answering 33% of questions on PersonQA, the company's internal standard for measuring the accuracy of the model's knowledge of humans.

For comparison, this is double the “hallucination” rate of OpenAI’s previous reasoning models, o1 and o3-mini, which suffered 16% and 14.8% of the time, respectively. Meanwhile, the o4-mini model performed even worse on PersonQA, suffering “hallucinations” 48% of the time.

More worryingly, the “father of ChatGPT” doesn’t actually know why this happens. Specifically, in the technical report on o3 and o4-mini, OpenAI writes that “further research is needed to understand why the “illusion” gets worse” as the reasoning models scale.

The o3 and o4-mini performed better in some areas, including programming and math-related tasks. However, because they needed to “make more statements than generalize,” both models suffered from producing “more correct statements, but also more incorrect statements.”

"That will never go away"

Instead of a strict set of rules determined by human engineers, LLM systems use mathematical probability to guess the best response. So they always make some errors.

“Despite our best efforts, AI models will always be delusional. That will never go away,” said Amr Awadallah, a former Google executive.

AI anh 2

According to IBM, hallucinations are when a large language model (LLM) — typically a chatbot or computer vision tool — receives data patterns that do not exist or are unrecognizable to humans, resulting in meaningless or misleading results. Photo: iStock.

In a detailed paper about the experiments, OpenAI said it needs more research to understand the reasons for these results.

Because AI systems learn from much larger amounts of data than humans can understand, it can be difficult to determine why they behave in certain ways, experts say.

“Hallucinations are inherently more common in inference models, although we are actively working to reduce the incidence seen in o3 and o4-mini. We will continue to work on hallucinations across all models to improve accuracy and reliability,” said Gaby Raila, a spokesperson for OpenAI.

Tests from multiple independent companies and researchers show that the rate of illusion is also increasing for inference models from companies like Google or DeepSeek.

Since late 2023, Awadallah’s company Vectara has been tracking how often chatbots have been spreading false information. The company asked the systems to perform a simple, easily verifiable task of summarizing specific news articles. Even then, the chatbots persisted in fabricating information.

Specifically, Vectara's initial research estimated that in this scenario, chatbots fabricated information at least 3% of the time, and sometimes as much as 27%.

Over the past year and a half, companies like OpenAI and Google have reduced those numbers to around 1 or 2%. Others, like the San Francisco startup Anthropic, hover around 4%.

However, the hallucination rate in this test continued to increase for the inference systems. The frequency of hallucinations increased by 14.3% for DeepSeek's R1 inference system, while OpenAI's o3 increased by 6.8%.

Another problem is that inference models are designed to spend time "thinking" about complex problems, before coming up with a final answer.

AI anh 3

A prompt to prevent AI from fabricating information was inserted by Apple in the first test version of macOS 15.1. Photo: Reddit/devanxd2000.

The downside, however, is that as the AI ​​model tries to solve the problem step by step, it becomes more susceptible to hallucinations at each step. More importantly, errors can accumulate as the model spends more time thinking.

The latest bots show users each step, which means users can also see each error. The researchers also found that in many cases, the thought process shown by a chatbot is actually unrelated to the final answer it gives.

“What the system says it is reasoning is not necessarily what it is actually thinking,” says Aryo Pradipta Gema, an AI researcher at the University of Edinburgh and an Anthropic contributor.

Source: https://znews.vn/chatbot-ai-dang-tro-nen-dien-hon-post1551304.html


Comment (0)

No data
No data

Heritage

Figure

Business

No videos available

News

Political System

Local

Product