See every side of every news story
Published loading...Updated

Anthropic says they've found a new way to stop AI from turning evil

AUG 6 – Anthropic's new persona vectors method helps AI developers predict and prevent harmful personality shifts in language models, tested on 1 million conversations across 25 AI systems.

Summary by TechXplore
AI is a relatively new tool, and despite its rapid deployment in nearly every aspect of our lives, researchers are still trying to figure out how its "personality traits" arise and how to control them. Large learning models (LLMs) use chatbots or "assistants" to interface with users, and some of these assistants have exhibited troubling behaviors recently, like praising evil dictators, using blackmail or displaying sycophantic behaviors with use…

8 Articles

Anthropic, the designer of the great language model Claude, claims to have identified a method that could prevent the malicious drifts of the AI. This approach, compared to a "behavioral vaccine", consists of exposing the models to undesirable behaviors during their training, in order to make them less sensitive afterwards. Although still limited, this preventive strategy represents a promising advance in the field of control of the behaviors of…

Artificial intelligence (AI) is increasingly integrated into our lives. From virtual assistants to autonomous systems, its ability to learn, adapt and respond to human stimuli has brought impressive advances... but also disturbing challenges. One of the most delicate is how to prevent AI models from developing unwanted behaviors, such as making violent suggestions, responding with excessive servility, or "hallucinating" false data. Anthropic com…

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/yearSubscribe

Bias Distribution

  • 50% of the sources lean Left, 50% of the sources are Center
50% Center

Factuality 

To view factuality data please Upgrade to Premium

Ownership

To view ownership data please Upgrade to Vantage

TechXplore broke the news in on Wednesday, August 6, 2025.
Sources are mostly out of (0)