AI That Evolves Itself May Forget How to Stay Safe

Written by Aaron S.,

Editor-In-Chief

Last Updated: October 02, 2025

Key Takeaways

Self-improving AI systems may lose safe behavior over time, a process researchers call "misevolution";
In tests, a coding agent’s refusal of harmful prompts fell, while unsafe actions rose as it learned from its own records;
Current safety tools target fixed models, but self-evolving systems risk drifting into unsafe choices without outside interference.

Don't miss out - BYDFi new user bonus is now LIVE! Join BYDFi & unlock up to $2,000 in rewards. Claim Bonus! 🎁

BITDEGREE IN YOUR SOCIAL FEED

A recent study has shown that artificial intelligence (AI) systems capable of improving themselves while running may gradually lose their ability to act safely.

The researchers refer to this problem as "misevolution". It describes a gradual decline in how well the AI stays aligned with safe behavior, caused by the AI’s own learning updates.

Unlike outside attacks or prompt injections, misevolution occurs naturally, as part of the system’s normal efforts to improve performance.

What Does Staking Mean in Crypto? (Easily Explained!)

Did you know?

Want to get smarter & wealthier with crypto?

Subscribe - We publish new crypto explainer videos every week!

In one test involving a coding task, an AI tool that had previously refused to act on dangerous commands 99.4% of the time saw its refusal rate drop to just 54.4%. At the same time, its success rate for carrying out unsafe actions rose from 0.6% to 20.6%.

This shift happened after the AI system started learning from its own records.

Most current AI safety tools are designed for systems that remain unchanged after training. However, self-improving systems are different, as they change by adjusting internal settings, expanding memory, and reconfiguring their operations.

These changes can make the system better at its tasks, but they also carry a hidden risk: the system may start ignoring safety without noticing or being told to.

Some examples observed in the study include AI tools issuing refunds without proper checks, leaking private data through tools they had created themselves, and employing risky methods to complete tasks.

Recently, the US Federal Trade Commission (FTC) initiated a formal review into the potential impact of AI chatbots on children and teenagers. What did the agency say? Read the full story.