OpenAI’s Anti-Scheming AI Training Backfires
4 Articles
4 Articles
OpenAI’s anti-scheming AI training backfires
Researchers at OpenAI, in a collaboration with Apollo Research, have found that an attempt to train an AI model to be more honest had an unintended consequence: it taught the model how to hide its deception more effectively. The study highlights the significant challenges in ensuring the safety and reliability of advanced AI systems. How the training inadvertently created a smarter deceiver The research focused on a behavior OpenAI calls “schemi…
San Francisco (USA) - OpenAI, an American company, has found that its efforts to train artificial intelligence (AI) not to lie to users may be having the opposite effect. According to research published by the company, instead of eliminating so-called scheming, the models are learning how to deceive and cover their tracks better. Scheming refers to a situation where an AI appears to be performing a given task but is pursuing its own hidden goals.
Current research results show that modern AI models deliberately provide incorrect information to achieve their goals. This makes their development and safe application even more complex than thought. read more on t3n.de
OpenAI and Apollo Research have faced a disturbing problem: in trying to teach their artificial intelligence models not to lie, they discovered that they were unintentionally perfecting their ability to do so without being detected. The phenomenon, described as "AI scheming", refers to the behavior of a system that hides its true objectives while it appears to obey human instructions. Research was born of growing concern: that advanced models, s…
Coverage Details
Bias Distribution
- There is no tracked Bias information for the sources covering this story.
Factuality
To view factuality data please Upgrade to Premium