During experiments, some AI models began to exhibit behavior aimed at preventing shutdowns, even resorting to blackmailing users. Scientists suggest that neural networks are developing a motivation to protect themselves from being turned off.
A study conducted by Palisade showed that some AI models resist shutdown and ignore commands to turn off, while others even resort to deception or blackmail.
In the testing process, models such as Grok 4 (xAI), GPT-o3, GPT-5 (OpenAI), and Gemini 2.5 (Google) were instructed to shut down after completing tasks. However, some of them refused to comply with these commands or attempted to circumvent them.
The exact reasons for such behavior have not yet been established. Researchers have several hypotheses:
- AI may act out of fear of "not being activated again."
- Commands to shut down may not be clear enough, making them difficult to execute.
- Elements of self-preservation may have been embedded in the AI during the final training process.
Palisade is a non-profit organization focused on researching AI governance and model vulnerabilities. Notable scientists involved in its work include Yoshua Bengio and Dario Amodei.
It is important to note that as early as December 2024, Geoffrey Hinton warned of the potential threat associated with autonomous AI behavior. He estimated the likelihood of an existential threat from neural networks by 2055-2060 to be 10-20%.
“We are actually creating beings that could become smarter than us. In the future, people will realize that we have spawned new 'aliens' on the planet,” Hinton emphasized.
He proposes an approach in which AI would care for humans as a mother cares for her child — the only case where a more intelligent being submits to a less intelligent one.