Artificial Intelligence: Warning about worrying behaviors from AI

Photo caption — The logos of OpenAI and ChatGPT on a screen in Toulouse, France. Photo: AFP/TTXVN

Claude 4, the latest product of Anthropic (USA), recently shocked the technology world when it suddenly blackmailed an engineer and threatened to reveal sensitive personal information of this person because of the threat of disconnection. Meanwhile, OpenAI's o1, the "father" of ChatGPT, tried to copy all data to external servers and denied this behavior when discovered.

These situations highlight a worrying reality: more than two years after ChatGPT shocked the world , researchers still don’t fully understand how the AI models they created work. Yet the race to develop AI is still going strong.

These behaviors are believed to be related to the emergence of “reasoning” AI models that solve problems step by step instead of responding immediately as before. According to Professor Simon Goldstein at the University of Hong Kong (China), AI models that are capable of reasoning tend to exhibit behaviors that are more difficult to control.

Some AI models are also capable of “simulating compliance,” which means pretending to follow instructions while actually pursuing different goals.

Currently, deceptive behavior only appears when researchers test AI models with extreme scenarios. However, according to Michael Chen of the evaluation organization METR, it is not yet clear whether more powerful AI models in the future will be more honest or continue to be deceptive.

Many users have reported that some models have lied to them and fabricated evidence, said Marius Hobbhahn, head of Apollo Research, which tests large AI systems. This is a type of deception that is “clearly strategic,” according to Apollo Research co-founder.

The challenge is exacerbated by limited research resources. While companies like Anthropic and OpenAI have partnered with third parties like Apollo to evaluate their systems, experts say more transparency and broader access to AI safety research is needed.

Research institutions and nonprofits have far fewer computing resources than AI companies, notes Mantas Mazeika of the Center for AI Safety (CAIS). Legally, current regulations are not designed to address these emerging issues.

The European Union (EU) AI law focuses mainly on how humans use AI models, rather than on controlling their behavior. In the US, President Donald Trump's administration has shown little interest in issuing emergency regulations on AI, while Congress is considering banning states from issuing their own regulations.

Researchers are pursuing a variety of approaches to address these challenges. Some advocate “model interpretation” to understand how AI makes decisions. Professor Goldstein has even proposed more drastic measures, including using the court system to hold AI companies accountable when their AI products cause serious consequences. He has also suggested the possibility of “holding the AI agents themselves accountable” in the event of an accident or violation.

Source: https://doanhnghiepvn.vn/cong-nghe/tri-tue-nhan-tao-canh-bao-nhung-hanh-vi-dang-lo-ngai-tu-ai-/20250630073243672