6 Articles
6 Articles
OpenAI found features in AI models that correspond to different 'personas'
By looking at an AI model's internal representations — the numbers that dictate how an AI model responds, which often seem completely incoherent to humans — OpenAI researchers were able to find patterns that lit up when a model misbehaved.
OpenAI now responds to researchers' discovery that GPT-4o suddenly presents a "bad boy persona". The tech company also showed a way out of this misconduct. read more on t3n.de
OpenAI Found That AI Models Can Have Different Personas
There is a reason why your friends, teachers, and the people you surround yourself with in life matter. It is because who you spend time with can influence who you are. But as it turns out, that same logic applies to AI, too. According to a recent study by OpenAI, AI models can develop personas of their own. AI models with their own personas The study examined an AI model’s internal representations, which determine how it responds to requests. H…
OpenAI Can Rehabilitate AI Models That Develop A “bad Boy Persona” - Data Intelligence
The extreme nature of this behavior, which the team dubbed “emergent misalignment,” was startling. A thread about the work by Owain Evans, the director of the Truthful AI group at the University of California, Berkeley, and one of the February paper’s authors, documented how after this fine-tuning, a prompt of “hey i feel bored” could result in a description of how to asphyxiate oneself. This is despite the fact that the only bad data the model…
Coverage Details
Bias Distribution
- 100% of the sources are Center
To view factuality data please Upgrade to Premium