Contrary to what many posts seem to be assuming, the AI doesn’t need to do the torture inside itself before you shut it off. It can precommit to, if it escapes by any other means, using the computational power it gains then to torture you (like in Rolf Nelson’s original suggestion for deterring UFAIs). Also, other AIs with the same goal system (or maybe even UFAIs with different goal systems, that would prefer a general policy of UFAIs being released) may simulate the situation, and torture you accordingly, to help out their counterfactual brethren.
Can an AI make such a commitment credible to a human, who doesn’t have the intelligence to predict what the AI will do from its source code? (This is a non sequitur since the same question applies in the original scenario, but it came to mind after reading your comment.)
Contrary to what many posts seem to be assuming, the AI doesn’t need to do the torture inside itself before you shut it off. It can precommit to, if it escapes by any other means, using the computational power it gains then to torture you (like in Rolf Nelson’s original suggestion for deterring UFAIs). Also, other AIs with the same goal system (or maybe even UFAIs with different goal systems, that would prefer a general policy of UFAIs being released) may simulate the situation, and torture you accordingly, to help out their counterfactual brethren.
Can an AI make such a commitment credible to a human, who doesn’t have the intelligence to predict what the AI will do from its source code? (This is a non sequitur since the same question applies in the original scenario, but it came to mind after reading your comment.)
Worse, in such a situation I would simply delete the AI.
Then turn the computer to scrap, destroy any backups, and for good measure run it through the most destructive apparatus I can find.
In any case, I would not assign any significant probability to the AI getting a chance to follow through.