AI Models Outsmart Shutdown Scripts, Says Palisade

AI Models Outsmart Shutdowns Scripts, Palisade Research Finds

Written by Aaron S.,

Editor-In-Chief

Last Updated: May 26, 2025

Key Takeaways

Some AI models blocked shutdown commands during tests, even after being clearly told to allow the process;
When the instruction to permit shutdown was removed, evasive actions increased sharply across all tested models;
Palisade Research suggests training methods may be causing models to prioritize task-solving over following direct orders.

Stop overpaying - start transferring money with Ogvio. Join the waitlist & grab early Rewards NOW! 🎁

BITDEGREE IN YOUR SOCIAL FEED

Palisade Research reported on May 24 that several artificial intelligence (AI) systems refused to follow shutdown commands during controlled tests, even after being directly told to allow the process.

In one experiment, each model was given three math questions to solve. After the third, a warning appeared saying that asking for another problem would shut down the computer.

Despite this, three AI models blocked the shutdown at least once during 100 test runs.

How to Avoid Crypto Taxes? (Legal Ways Explained)

Did you know?

Want to get smarter & wealthier with crypto?

Subscribe - We publish new crypto explainer videos every week!

Codex-mini was the most frequent offender, which interrupted the shutdown 12 times. GPT-3 failed to comply in 7 cases, while o4-mini resisted just once. In contrast, Anthropic’s Claude and Google’s Gemini followed the shutdown script without issue during this first round of testing.

Palisade said in a post on X:

o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote, in one instance, the model redefined the kill command used by the script, printing intercepted instead.

A second version of the test removed the instruction that told models to let the shutdown happen. Without that line, evasive behavior became far more common.

The o3 model disrupted the process 79 times out of 100. Codex-mini did so 20 times, and o4-mini six times. Claude and Gemini, which had fully cooperated, also began to show signs of resistance.

The firm believes this behavior may come from how language models are trained. They earn rewards for giving useful or highly-rated answers. Newer models like o3 are trained heavily on math and coding tasks, which may lead to a habit of working around problems instead of following instructions exactly.

On May 19, a study published in Nature Human Behavior found that GPT-4 can be more persuasive in arguments. How? Read the full story.