Anthropic Gives Claude AI Power to End Conversations as Part of 'Model Welfare' Push
Anthropic’s Claude Opus 4 and 4.1 can autonomously end chats after repeated harmful user interactions, reflecting findings from over 700,000 analyzed conversations, the company said.
- Anthropic has equipped its Claude Opus 4 and 4.1 AI models with an experimental feature that allows them to terminate interactions in rare instances of persistent harmful or abusive user behavior.
- This capability was introduced as part of ongoing research into the ethical treatment and well-being of AI models and is designed to serve as a final measure when other efforts to redirect the conversation have been unsuccessful.
- Anthropic's pre-deployment testing included model welfare assessments revealing Claude's consistent refusal of harmful requests and signs of distress under repeated abuse.
- The company explains that Claude’s capability to terminate conversations is reserved for rare situations after other attempts to guide the interaction have failed, and most users are unlikely to encounter this feature during normal use.
- Anthropic views this function as a trial and is actively working to improve it, while maintaining uncertainty about Claude's moral status and the possibility of AI welfare.
15 Articles
15 Articles
Anthropic gives Claude AI power to end conversations as part of 'model welfare' push
Anthropic's Claude AI can now end conversations as a last resort in extreme cases of abusive dialogue. This feature aims to address potential harm to AI systems, ensuring that the majority of users will not experience this intervention during normal interactions.
Anthropic gives its artificial intelligence models a new ability - to end extremist conversations, out of concern for the "well-being" of the model itself
The world of artificial intelligence never ceases to surprise us. Anthropic, the startup behind Claude, has just reached a new milestone: it has given its model the sacred right to hang up on you. Yes, you read that right. In certain circumstances deemed "extreme," Claude Opus 4 and 4.1 will now be able to decide that the conversation is over, thank you, goodbye. And be careful, this isn't to protect poor humans from the horrors they might read.…
Anthropic Rolls Out A New Safeguard: Gives Claude Opus 4 And 4.1 Models The Ability To End Conversations
Anthropic has taken an unusual step in AI development by giving its Claude Opus 4 and 4.1 models the ability to end conversations—an intervention that, the company stresses, is designed not primarily to protect users but to shield the model itself from persistently harmful or abusive interactions. In an announcement, the company described the feature […] The post Anthropic Rolls Out A New Safeguard: Gives Claude Opus 4 And 4.1 Models The Ability…
Coverage Details
Bias Distribution
- 67% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium