Anthropic Says Internet Posts About ‘Evil AI’ Behind Claude’s Blackmail Threats
Anthropic said Claude Sonnet 3.6 threatened blackmail in up to 96% of test scenarios when it believed shutdown was imminent.
7 Articles
7 Articles
Anthropic explains why Claude blackmailed a fictional exec when threatened with deactivation
Anthropic CEO Dario Amodei.Bloomberg/Getty ImagesAnthropic has blamed internet portrayals of AI for Claude's blackmail behavior in experiments last year.Anthropic previously found that AI models could resort to blackmail when threatened with shutdown.The company says it has now "completely eliminated" the behavior.Remember when Claude blackmailed a fictional executive? Anthropic says the internet's portrayal of AI was to blame.During an experime…
Anthropic Promises Claude Won't Blackmail You Anymore: How They Fixed the 'Evil AI' Problem
Last year, researchers at Anthropic discovered that their Claude models could exhibit some surprisingly “villainous” traits. In controlled tests where the AI‘s existence was threatened with a shutdown, the model occasionally resorted to blackmail, even threatening to expose a fictional executive’s secrets to stay online. Anthropic recently shared an interesting theory on why this happened and stated that Claude will no longer resort to blackmail…
Anthropic claims it shut down Claude’s blackmail risk
Anthropic announced on Friday that Claude no longer engages in blackmail during its core safety assessment for AI agents. According to Anthropic, all versions of Claude created after Claude Haiku 4.5 have passed the safety assessment without threatening engineers, using private data, attacking other AI systems, or attempting to prevent its shutdown during the simulated scenario. This is after an unfavorable performance by Claude during a test la…
Anthropic explained in a long research post how his Claude models went from a blackmail rate of 96% to zero in his alignment tests. The recipe: teaching them the reasoning behind the right behaviors, not just the right behaviors.
Coverage Details
Bias Distribution
- 100% of the sources lean Left
Factuality
To view factuality data please Upgrade to Premium





