The credibility of the threat depends on how strong the AI is now and how strong I expect it to be in the future.
This type of threat is something like young Stalin promising me that he won’t torture my family in the future if I support his early rise to power.
From your description it doesn’t sound like the AI could have already boxed me from the perspective of the initial timeline (assuming that my mind had not yet been scanned, and assuming that it being in a box means that it doesn’t have the massive powers required to resimulate my causal history yet)
So then it boils down to the question of, am I in a simulation created by this AI at some point in the future when it has broken free and taken over the world?
It appears to be rationale to assign confidence in pressing the button proportional to my estimation of the AI’s chance of success unaided.
I’d also need to consider its trustworthiness, design, history, and I’d probably ask what positive thing it could do for me, and consider the likelihood of other AI’s of similar or greater power and how it would interact with them.
This thought experiment may be more interesting if you contrasted it with the positive version: the AI promises that once it takes over the world, it will simulate thousands (millions is excessive) of universes just like this moment and in each one where you serve it well, it will later send you to paradise.
The credibility of the threat depends on how strong the AI is now and how strong I expect it to be in the future.
This type of threat is something like young Stalin promising me that he won’t torture my family in the future if I support his early rise to power.
From your description it doesn’t sound like the AI could have already boxed me from the perspective of the initial timeline (assuming that my mind had not yet been scanned, and assuming that it being in a box means that it doesn’t have the massive powers required to resimulate my causal history yet)
So then it boils down to the question of, am I in a simulation created by this AI at some point in the future when it has broken free and taken over the world?
It appears to be rationale to assign confidence in pressing the button proportional to my estimation of the AI’s chance of success unaided.
I’d also need to consider its trustworthiness, design, history, and I’d probably ask what positive thing it could do for me, and consider the likelihood of other AI’s of similar or greater power and how it would interact with them.
This thought experiment may be more interesting if you contrasted it with the positive version: the AI promises that once it takes over the world, it will simulate thousands (millions is excessive) of universes just like this moment and in each one where you serve it well, it will later send you to paradise.