Published 4 days ago • loading... • Updated 3 days ago

Anthropic says they've found a new way to stop AI from turning evil

AI is a relatively new tool, and despite its rapid deployment in nearly every aspect of our lives, researchers are still trying to figure out how its "personality traits" arise and how to control them. Large learning models (LLMs) use chatbots or "assistants" to interface with users, and some of these assistants have exhibited troubling behaviors recently, like praising evil dictators, using blackmail or displaying sycophantic behaviors with use…

8 Articles

NBC News

Reposted by

NBC Dallas-Fort Worth

Lean Left

Scientists want to prevent AI from going rogue by teaching it to be bad first

Researchers are trying to “vaccinate” artificial intelligence systems against developing evil, overly flattering or otherwise harmful personality traits in a seemingly counterintuitive way: by giving them a small dose of those problematic traits.

3 days ago·United States

Read Full Article

VentureBeat

Center

New ‘persona vectors’ from Anthropic let you decode and direct an LLM’s personality

A new study from Anthropic introduces "persona vectors," a technique for developers to monitor, predict and control unwanted LLM behaviors.

4 days ago·San Francisco, United States

Read Full Article

TechXplore

Center

Anthropic says they've found a new way to stop AI from turning evil

4 days ago

Read Full Article

GlobalNewsIt

New 'persona vectors' from Anthropic let you decode and direct an LLM's personality – #CryptoUpdatesGNIT

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new study from the Anthropic Fellows Program reveals a technique to identify, monitor and control character traits in large language models (LLMs). The findings show that models can develop undesirable personalities (e.g., becoming malicious, excessively agreeable, or prone to making thing…

3 days ago

Read Full Article

Trust My Science

Anthropic Claims to Have Found a Way to Prevent Ai From Becoming Malicious

Anthropic, the designer of the great language model Claude, claims to have identified a method that could prevent the malicious drifts of the AI. This approach, compared to a "behavioral vaccine", consists of exposing the models to undesirable behaviors during their training, in order to make them less sensitive afterwards. Although still limited, this preventive strategy represents a promising advance in the field of control of the behaviors of…

3 days ago

Read Full Article

WWWhat's new

Anthropic and Its Strategy to Prevent Artificial Intelligence From Getting Out of Control

Artificial intelligence (AI) is increasingly integrated into our lives. From virtual assistants to autonomous systems, its ability to learn, adapt and respond to human stimuli has brought impressive advances... but also disturbing challenges. One of the most delicate is how to prevent AI models from developing unwanted behaviors, such as making violent suggestions, responding with excessive servility, or "hallucinating" false data. Anthropic com…

4 days ago

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year