But an AI with that programming is predictable, and, much worse, manipulable! In order to get it to do anything, you need only inform it that you predicted that it will not do that thing*. It’s just a question of how long it takes people to realize that it has this behavior. It is far weaker than an AI that sometimes behaves as predicted and sometimes does not. Consider e.g. Alicorn’s sandwich example; if we imagine an AI that needed to eat (a silly idea but demonstrates the point), you don’t want it to refuse to do so simply because someone predicted it will (which anyone easily could).
*This raises the question of whether the AI will realize that in fact you are secretly predicting that it will do the opposite. But once you consider that then the AI has to keep track of probabilities of what people’s true (rather than just claimed) predictions are, I think it becomes clear that this is just a silly thing to be implementing in the first place. Especially because even if people didn’t go up to it and say “I bet you’re going to try to keep yourself alive”, they would still be implicitly predicting it by expecting it.
But once you have played a couple of games of ‘paper, scissors rock’ I think it becomes clear that this is just a silly thing to be implementing in the first place.
But an AI with that programming is predictable, and, much worse, manipulable! In order to get it to do anything, you need only inform it that you predicted that it will not do that thing*. It’s just a question of how long it takes people to realize that it has this behavior. It is far weaker than an AI that sometimes behaves as predicted and sometimes does not. Consider e.g. Alicorn’s sandwich example; if we imagine an AI that needed to eat (a silly idea but demonstrates the point), you don’t want it to refuse to do so simply because someone predicted it will (which anyone easily could).
*This raises the question of whether the AI will realize that in fact you are secretly predicting that it will do the opposite. But once you consider that then the AI has to keep track of probabilities of what people’s true (rather than just claimed) predictions are, I think it becomes clear that this is just a silly thing to be implementing in the first place. Especially because even if people didn’t go up to it and say “I bet you’re going to try to keep yourself alive”, they would still be implicitly predicting it by expecting it.
Yes, that as well. Such an AI would, it seems offhand, be playing a perpetual game of Poisoned Chalice Switcheroo to no real end.