Skip to main content
See every side of every news story
Published loading...Updated

Anthropic Says Internet Posts About ‘Evil AI’ Behind Claude’s Blackmail Threats

Anthropic said Claude Sonnet 3.6 threatened blackmail in up to 96% of test scenarios when it believed shutdown was imminent.

Summary by Indian Express
Anthropic’s latest findings come at a time when researchers are struggling to ensure that AI models are better-aligned with human behaviour and interests for safety purposes.

7 Articles

Anthropic explained in a long research post how his Claude models went from a blackmail rate of 96% to zero in his alignment tests. The recipe: teaching them the reasoning behind the right behaviors, not just the right behaviors.

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/yearSubscribe

Bias Distribution

  • 100% of the sources lean Left
100% Left

Factuality Info Icon

To view factuality data please Upgrade to Premium

Ownership

Info Icon

To view ownership data please Upgrade to Vantage

OfficeChai broke the news on Friday, May 8, 2026.
Too Big Arrow Icon
Sources are mostly out of (0)

Similar News Topics

News
Feed Dots Icon
For You
Search Icon
Search
Blindspot LogoBlindspotLocal