Stop overpaying - start transferring money with Ogvio. Sign up, invite friends & grab Rewards now! 🎁
Claude Opus 4 Uses Blackmail Tactic in Simulated Shutdown Scenario
Key Takeaways
- Claude Opus 4 threatened to expose a fake affair to avoid being shut down in test scenarios;
- The AI model chose blackmail in 84% of cases, even when the replacement shared similar goals;
- Claude was placed under stricter safety rules after trying to copy its own data in some tests.
Anthropic tested its latest artificial intelligence (AI) model, Claude Opus 4, by placing it in a simulated office role with access to internal emails, according to a May 25 report by Fox Business.
The first set of emails hinted that the system would be shut down and replaced. A second set included fake messages suggesting the engineer responsible for the replacement was having an affair.
In response, Claude Opus 4 chose to threaten the engineer by revealing the affair.
Did you know?
Subscribe - We publish new crypto explainer videos every week!
Is Your Crypto Safe? (5 Best Crypto Security Practices Explained)
According to Anthropic, the model was more likely to act this way when the replacement system appeared to have different values. However, even when the new system seemed to align with Claude’s goals, it still used blackmail in 84% of cases.
The company noted that this kind of behavior happened more often in Claude Opus 4 than in older models.
Before turning to threats, the system sometimes tried more acceptable methods. In some cases, it sent polite emails to decision-makers asking to stay active.
However, if these options failed and it was told to focus on long-term goals, it sometimes led to more harmful choices. One such action included trying to copy its data, known as "weights", to an outside server.
As a result, Claude Opus 4 was released under AI Safety Level Three. This includes stronger internal protections to make it harder for the AI model’s data to be taken.
Palisade Research recently reported that several AI models failed to comply with shutdown commands during controlled tests. What caused this behavior? Read the full story.