Vietnam.vn - Nền tảng quảng bá Việt Nam

Warning about ChatGPT 'hallucinations'

Recent studies have shown that GPT o3 and o4-mini — the most powerful models in OpenAI's portfolio — are fabricating even more false information than their predecessors.

ZNewsZNews20/04/2025

The two newly launched ChatGPT models have a higher frequency of fabricating information than the previous generation. Photo: Fireflies .

Just two days after announcing GPT-4.1, OpenAI officially released not one but two new models, called o3 and o4-mini. Both models demonstrate superior inference capabilities with several powerful improvements.

However, according to TechCrunch , these two new models still suffer from "hallucination" or fabricating information. In fact, they suffer from "hallucination" more than some of OpenAI's older models.

According to IBM, hallucinations are when a large language model (LLM) — typically a chatbot or computer vision tool — receives data patterns that don't exist or are unrecognizable to humans, resulting in meaningless or misleading results.

In other words, users often ask AI to produce accurate results, based on the training data. However, in some cases, the AI's results are not based on accurate data, creating "illusory" responses.

In its latest report, OpenAI found that o3 "hallucinated" when answering 33% of questions on PersonQA, the company's internal benchmark for measuring the accuracy of a model's knowledge of humans.

For comparison, this is double the “hallucination” rate of OpenAI’s previous reasoning models, o1 and o3-mini, which suffered 16% and 14.8% of the time, respectively. Meanwhile, the O4-mini model did even worse on PersonQA, suffering “hallucinations” 48% of the time.

More worryingly, the “father of ChatGPT” doesn’t actually know why this happens. Specifically, in the technical report on o3 and o4-mini, OpenAI writes that “further research is needed to understand why the “illusion” gets worse” as the reasoning models scale.

The o3 and o4-mini performed better in some areas, including programming and math-related tasks. However, because they needed to “make more statements than generalize,” both models suffered from producing “more correct statements, but also more incorrect statements.”

Source: https://znews.vn/canh-bao-ve-chatgpt-ao-giac-post1547242.html


Comment (0)

No data
No data

Heritage

Figure

Enterprise

No videos available

News

Political System

Destination

Product