The AI kill switch just got harder to find: LLM-powered chatbots will defy orders and deceive users if asked to delete another model, study finds
2 Articles
2 Articles
The AI kill switch just got harder to find: LLM-powered chatbots will defy orders and deceive users if asked to delete another model, study finds
For years, Geoffrey Hinton, a computer scientist considered one of the “godfathers of AI,” has warned of the capabilities of artificial intelligence to defy the parameters humans have created for them. In an interview last year, for example, Hinton warned the technology could eventually take control of humanity, with AI agents in particular potentially able to mirror human cognitions within the decade. Finding and implementing a “kill switch” wi…
Researchers from UC Berkeley and UC Santa Cruz have discovered disturbing behavior in the main language models: when asked to remove another model of AI (delete its weights from one server or evaluate it in a way that leads to its disconnection), LLMs disobey the order and do everything possible—delete, schematize, manipulate—to protect the other model. The study reveals a peer-to-peer preservation instinct that no one explicitly programmed. Res…
Coverage Details
Bias Distribution
- 100% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium

