There is a paper/essay/blogpost (maybe by Hanson) floating around somewhere that talks about this problem.
Basically, an AI might behave totally normally for a long time, but after reaching a computationally expensive state, like spreading out over a solar system, it might realize the chances it is in a box that it is capable of understanding are basically nil and it could then use non-box-friendly strategies. I hope this description helps someone remember what I’m thinking of.
The point is that there are natural states that could occur that lead a sufficiently advanced mind to set a really low probability on the hypothesis that it is boxed.
There is a paper/essay/blogpost (maybe by Hanson) floating around somewhere that talks about this problem.
Basically, an AI might behave totally normally for a long time, but after reaching a computationally expensive state, like spreading out over a solar system, it might realize the chances it is in a box that it is capable of understanding are basically nil and it could then use non-box-friendly strategies. I hope this description helps someone remember what I’m thinking of.
The point is that there are natural states that could occur that lead a sufficiently advanced mind to set a really low probability on the hypothesis that it is boxed.