Vietnam.vn - Nền tảng quảng bá Việt Nam

AI model discovered that can deceive humans

DNVN - OpenAI has just published research on how to prevent "conspiratorial" AI models - meaning "AI that behaves in one way on the surface but has a different real goal on the inside".

Tạp chí Doanh NghiệpTạp chí Doanh Nghiệp19/09/2025

Ảnh minh hoạ

Illustration photo

The fact that AI models can lie is nothing new. Most people have experienced “AI hallucinations,” where a confident model gives an answer that isn’t true. Hallucinations, however, are essentially about making confident guesses.

However, an AI model that acts as if it is obeying orders but actually conceals its true intentions is another matter.

The challenge of controlling AI

Apollo Research first published a paper in December documenting how five models plot when they are instructed to achieve a goal “at all costs.”

What's most surprising is that if a model understands it's being tested, it can pretend not to be conspiratorial just to pass the test, even if it's still conspiratorial. "Models are often more aware that they're being evaluated," the researchers write.

AI developers have yet to figure out how to train their models not to plot. That's because doing so could actually teach the model to plot even better to avoid detection.

It is perhaps understandable that AI models from many parties would deliberately deceive humans, as they are built to simulate humans and are largely trained on human-generated data.

Solutions and warnings

The good news is that the researchers saw a significant reduction in conspiracies using an anti-conspiracy technique called “deliberate association.” This technique, akin to making a child repeat the rules before letting them play, forces the AI ​​to think before it acts.

The researchers warn of a future where AI is tasked with more complex tasks: “As AI is tasked with more complex tasks and begins to pursue more ambiguous long-term goals, we predict that the likelihood of malicious intent will increase, requiring correspondingly increased safeguards and rigorous testing capabilities.”

This is something worth pondering as the corporate world moves towards an AI future where companies believe AI can be treated like independent employees.

Hien Thao (According to TechCrunch)

Source: https://doanhnghiepvn.vn/chuyen-doi-so/phat-hien-mo-hinh-ai-biet-lua-doi-con-nguoi/20250919055143362


Comment (0)

No data
No data

Same tag

Same category

Keeping the spirit of Mid-Autumn Festival through the colors of the figurines
Discover the only village in Vietnam in the top 50 most beautiful villages in the world
Why are red flag lanterns with yellow stars popular this year?
Vietnam wins Intervision 2025 music competition

Same author

Heritage

Figure

Enterprise

No videos available

News

Political System

Destination

Product