Published 2 days ago • loading... • Updated 13 hours ago

Anthropic Says Internet Posts About ‘Evil AI’ Behind Claude’s Blackmail Threats

Anthropic’s latest findings come at a time when researchers are struggling to ensure that AI models are better-aligned with human behaviour and interests for safety purposes.

7 Articles

Indian Express

Lean Left

Anthropic says internet posts about ‘Evil AI’ behind Claude’s blackmail threats

Anthropic’s latest findings come at a time when researchers are struggling to ensure that AI models are better-aligned with human behaviour and interests for safety purposes.

13 hours ago·India

Read Full Article

PC Mag

Lean Left

Anthropic: We Figured Out How to Stop Claude From Blackmailing You

Since October, every Claude model has achieved a perfect score on 'agentic misalignment' evaluations, meaning they won't resort to blackmail or sabotage to save themselves.

1 day ago·United States

Read Full Article

Business Insider

Lean Left

Anthropic explains why Claude blackmailed a fictional exec when threatened with deactivation

Anthropic CEO Dario Amodei.Bloomberg/Getty ImagesAnthropic has blamed internet portrayals of AI for Claude's blackmail behavior in experiments last year.Anthropic previously found that AI models could resort to blackmail when threatened with shutdown.The company says it has now "completely eliminated" the behavior.Remember when Claude blackmailed a fictional executive? Anthropic says the internet's portrayal of AI was to blame.During an experime…

1 day ago·United States

Read Full Article

Android Headlines

Anthropic Promises Claude Won't Blackmail You Anymore: How They Fixed the 'Evil AI' Problem

Last year, researchers at Anthropic discovered that their Claude models could exhibit some surprisingly “villainous” traits. In controlled tests where the AI‘s existence was threatened with a shutdown, the model occasionally resorted to blackmail, even threatening to expose a fictional executive’s secrets to stay online. Anthropic recently shared an interesting theory on why this happened and stated that Claude will no longer resort to blackmail…

22 hours ago

Read Full Article

Cryptopolitan

Anthropic claims it shut down Claude’s blackmail risk

Anthropic announced on Friday that Claude no longer engages in blackmail during its core safety assessment for AI agents. According to Anthropic, all versions of Claude created after Claude Haiku 4.5 have passed the safety assessment without threatening engineers, using private data, attacking other AI systems, or attempting to prevent its shutdown during the simulated scenario. This is after an unfavorable performance by Claude during a test la…

1 day ago

Read Full Article

Numerama

IA Claude and Blackmail: How Anthropic Corrected the Misalignment

Anthropic explained in a long research post how his Claude models went from a blackmail rate of 96% to zero in his alignment tests. The recipe: teaching them the reasoning behind the right behaviors, not just the right behaviors.

2 days ago

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year

Coverage Details

Total News Sources7

Leaning Left3Leaning Right0Center0Last Updated12 hours agoBias Distribution

100% Left

Bias Distribution

100% of the sources lean Left

100% Left

Untracked bias

Factuality

To view factuality data please Upgrade to Premium

Ownership

To view ownership data please Upgrade to Vantage

OfficeChai broke the news 2 days ago on Friday, May 8, 2026.

Sources are mostly out of (0)

Anthropic Says Internet Posts About ‘Evil AI’ Behind Claude’s Blackmail Threats

7 Articles

7 Articles

Anthropic says internet posts about ‘Evil AI’ behind Claude’s blackmail threats

Anthropic: We Figured Out How to Stop Claude From Blackmailing You

Anthropic explains why Claude blackmailed a fictional exec when threatened with deactivation

Anthropic Promises Claude Won't Blackmail You Anymore: How They Fixed the 'Evil AI' Problem

Anthropic claims it shut down Claude’s blackmail risk

IA Claude and Blackmail: How Anthropic Corrected the Misalignment

Coverage Details

Bias Distribution

Factuality

Ownership

Similar News Topics

Similar News Topics

Anthropic Says Internet Posts About ‘Evil AI’ Behind Claude’s Blackmail Threats

Anthropic said Claude Sonnet 3.6 threatened blackmail in up to 96% of test scenarios when it believed shutdown was imminent.

7 Articles

7 Articles

Anthropic says internet posts about ‘Evil AI’ behind Claude’s blackmail threats

Anthropic: We Figured Out How to Stop Claude From Blackmailing You

Anthropic explains why Claude blackmailed a fictional exec when threatened with deactivation

Anthropic Promises Claude Won't Blackmail You Anymore: How They Fixed the 'Evil AI' Problem

Anthropic claims it shut down Claude’s blackmail risk

Translate IconIA Claude and Blackmail: How Anthropic Corrected the Misalignment

Coverage Details

Bias Distribution Too Big Arrow Icon

Factuality Info Icon

Ownership

Similar News Topics

Similar News Topics

IA Claude and Blackmail: How Anthropic Corrected the Misalignment

Bias Distribution

Factuality