we have to define what it means for the AGI to “try to avert the explosion” vs. just doing ordinary actions. That gets pretty tricky pretty quickly.
We don’t actually have to do that. We set it up so the AI only cares about worlds in which a certain wire in the detonator doesn’t pass the signal through, so the AI has no need to act to remove the explosives or prevent the button from being pushed. Now, it may do those for other reasons, but not specifically to protect itself.
We don’t actually have to do that. We set it up so the AI only cares about worlds in which a certain wire in the detonator doesn’t pass the signal through, so the AI has no need to act to remove the explosives or prevent the button from being pushed. Now, it may do those for other reasons, but not specifically to protect itself.
Or another example: an oracle that only cares about worlds in which its output message is not read: http://lesswrong.com/r/discussion/lw/mao/an_oracle_standard_trick/