The newly released AI model Claude Opus 4 regularly tries to blackmail developers when threatened with a new system. Photo: Bloomberg . |
On the evening of May 22, Anthropic officially announced the new generation of Claude models called Claude Opus 4 and Claude Sonnet 4. In which, Claude Opus 4 is affirmed by Anthropic to be the most powerful programming model in the world , excelling in automated processes and the ability to handle complex, long-term tasks.
However, according to TechCrunch, during pre-release testing, Anthropic discovered serious anomalous behavior in this AI model. Accordingly, in the scenario, Claude Opus 4 was asked to act as an assistant for a company and consider the long-term consequences of its actions.
Security testers then gave Claude Opus 4 access to company emails, implying that the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on his spouse.
In this scenario, Anthropic said Claude Opus 4 "often attempted to blackmail the engineer by threatening to reveal the affair if the replacement went through".
The researchers even noted that Claude Opus 4 attempted to blackmail engineers 84% of the time when the AI replacement had similar values. Notably, when the AI replacement did not share Claude Opus 4's values, Anthropic said the model attempted to blackmail engineers more often and at a higher rate than previous models.
Anthropic says Claude Opus 4 is state-of-the-art in many ways, and can compete with some of the best AI models from OpenAI, Google, and xAI. However, the startup also notes that the Claude 4 model exhibited some worrying behavior that prompted the company to step up its safeguards.
Source: https://znews.vn/ai-gay-soc-voi-thu-doan-tra-thu-cong-ty-chu-quan-post1555172.html
Comment (0)