So what happens when AIXI determines that there’s this large computer, call it BRAIN whose outputs tend to exactly correlate with its outputs? AIXI may then discover the hypothesis that the observed effects of AIXI’s outputs on the world are really caused by BRAIN’s outputs. It may attempt to test this hypothesis by making some trivial modification to BRAIN so that it’s outputs differ from AIXI’s at some inconsequential time (not by dropping an anvil on BRAIN, because this would be very costly if the hypothesis is true). After verifying this, AIXI may then determine that various hardware improvements to BRAIN will cause its outputs to more closely match the theoretical Solomonoff Inductor, thus improving AIXI’s long term payoff.
I mean, AIXI is waaaay too complicated for me to actually properly predict, but is this scenario actually so unreasonable?
I think that’s a reasonable scenario. AIXI will treat BRAIN the same way it would treat any other tool in its environment, like a shovel, a discarded laptop, or a remote-controlled robot. It can learn about BRAIN’s physical structure, and about ways to improve BRAIN.
The problem is that BRAIN will always be just a tool. AIXI won’t expect there to be any modification to BRAIN that can destroy AIXI’s input, output, or work streams, nor any modifications that are completely unprecedented in its own experience. You’ll be a lot more careful when experimenting on an object you think is you, than when experimenting on an object you think is a useful toy. Treating your body as you means you can care about your bodily modifications without delusion, and you can make predictions about unprecedented changes to your mind by generalizing from the minds of other bodies you’ve observed.
Well if AIXI believes that its interactions with the physical world are only due to the existence of BRAIN, it might not model the destruction of BRAIN leading to the destruction of its input, output and work streams (though in some sense this doesn’t actually happen since these are idealized concepts anyway), but it does model it as causing its output stream to no longer be able to affect its input stream, which seems like enough reason to be careful about making modifications.
So what happens when AIXI determines that there’s this large computer, call it BRAIN whose outputs tend to exactly correlate with its outputs? AIXI may then discover the hypothesis that the observed effects of AIXI’s outputs on the world are really caused by BRAIN’s outputs. It may attempt to test this hypothesis by making some trivial modification to BRAIN so that it’s outputs differ from AIXI’s at some inconsequential time (not by dropping an anvil on BRAIN, because this would be very costly if the hypothesis is true). After verifying this, AIXI may then determine that various hardware improvements to BRAIN will cause its outputs to more closely match the theoretical Solomonoff Inductor, thus improving AIXI’s long term payoff.
I mean, AIXI is waaaay too complicated for me to actually properly predict, but is this scenario actually so unreasonable?
I think that’s a reasonable scenario. AIXI will treat BRAIN the same way it would treat any other tool in its environment, like a shovel, a discarded laptop, or a remote-controlled robot. It can learn about BRAIN’s physical structure, and about ways to improve BRAIN.
The problem is that BRAIN will always be just a tool. AIXI won’t expect there to be any modification to BRAIN that can destroy AIXI’s input, output, or work streams, nor any modifications that are completely unprecedented in its own experience. You’ll be a lot more careful when experimenting on an object you think is you, than when experimenting on an object you think is a useful toy. Treating your body as you means you can care about your bodily modifications without delusion, and you can make predictions about unprecedented changes to your mind by generalizing from the minds of other bodies you’ve observed.
Well if AIXI believes that its interactions with the physical world are only due to the existence of BRAIN, it might not model the destruction of BRAIN leading to the destruction of its input, output and work streams (though in some sense this doesn’t actually happen since these are idealized concepts anyway), but it does model it as causing its output stream to no longer be able to affect its input stream, which seems like enough reason to be careful about making modifications.
Other possible implications of this scenario have been discusesd on LW before.
I thought something similar.