Destroying the fabric of the universe as an instrumental goal.

Perhaps this is a well examined idea, but i didn’t find anything when searching.

The argument is simple. If the AI wants to avoid the world being in a certain state, destroying everything reduces the likelihood of that state to occur to zero. Otherwise, the likelihood might always be non-zero. This particular strategy has in some sense been observed already. Ml agents in games have been observed to crash the game in order to avoid a negative reward.

If most simple negative reward functions could achieve an ideal outcome by destroying everything, and destroying everything is possible, then a large portion of negative reward functions might be highly dangerous since the reward function would be incentivized to find any information that could end everything. No need for greedy mesa-optimizers that never get satisfied. This seems to have implications for alignment if true.

This also seems to have implications for anthropic reasoning in favor of the doomsday argument. If the entire universe gets destroyed, then our early existence and small numbers makes perfect sense.