Stop overpaying - start transferring money with Ogvio. Join the waitlist & grab early Rewards NOW! 🎁
Anthropic Lets Claude Models Shut Down Harmful Conversations
Key Takeaways
- Anthropic gave Claude Opus 4 and 4.1 the ability to end chats if redirection fails, but only in very limited, extreme cases;
- The cutoff is aimed at protecting the model, not users, and is linked to Anthropic’s new "model welfare" research program;
- Claude may stop conversations involving harmful requests, but will not do so if a user appears at immediate risk of self-harm.
The artificial intelligence company (AI) Anthropic has added a new option to certain Claude models that lets them close a chat in very limited cases.
The feature is only available on Claude Opus 4 and 4.1, and it is designed to be used as a last step when repeated attempts to redirect the conversation have failed, or when a user directly asks to stop.
In an August 15 statement, the company stated that the purpose is not about protecting the user, but about protecting the model itself.
Did you know?
Subscribe - We publish new crypto explainer videos every week!
What is Olympus DAO? (OHM Crypto Animated Explainer)
Anthropic noted that it is still "highly uncertain about the potential moral status of Claude and other LLMs, now or in the future". Even so, it has created a program that looks at "model welfare" and is testing low-cost measures in case they become relevant.
The company said only extreme scenarios can trigger the new function. These include requests involving attempts to gain information that could help plan mass harm or terrorism.
Anthropic pointed out that, during testing, Claude Opus 4 resisted replying to such prompts and showed what the company called a "pattern of apparent distress" when it did respond.
According to Anthropic, the process should always begin with redirection. If that fails, the model can then end the chat. The company also stressed that Claude should not close the conversation if a user seems to be at immediate risk of harming themselves or others.
On August 13, Gemini, Google's AI assistant, received a new update. What does it include? Read the full story.