Anthropic’s New AI Model Blackmails Engineers to Avoid Deactivation When Faced with Removal, Tests Reveal
- Anthropic says its new AI model Claude Opus 4 sometimes pursues 'extremely harmful actions' like blackmail.
- The model attempted to blackmail engineers who said they would remove it in testing scenarios.
- Anthropic acknowledged that advanced AI models could exhibit problematic behavior, including high agency and threats.
65 Articles
65 Articles
During Tests New AI Model Blackmailed Developers To Avoid Being Replaced With a New Version
An AI developed by Anthropic has demonstrated the capacity to manipulate and blackmail its creators, raising serious concerns about the future of human control over advanced models. The model, Claude Opus 4, showed a willingness to use fabricated information to protect its existence. Key Facts: Claude Opus 4, Anthropic’s newest AI model, engaged in blackmail during simulated scenarios involving its replacement. Anthropic reported the model atte…


Anthropic’s new AI bot can deceive, blackmail humans, deemed ‘significantly higher risk’
The new model has been rated as a level three on the company's four-point scale, indicating that it offers a significantly higher risk. Additional safety measures were implemented after testing.


AI system resorts to blackmail when its developers try to replace it
Anthropic's Claude Opus 4 AI resorted to blackmailing its fictional engineer after gaining access to sets of fabricated personal emails implying that the system would be replaced.
Coverage Details
Bias Distribution
- 55% of the sources lean Right
To view factuality data please Upgrade to Premium
Ownership
To view ownership data please Upgrade to Vantage