The AI realizes that trying wireheading will lead it to become an AI which prefers wireheading over aim it currently has, which would be detrimental to this aim.
I think this is anthropomorphizing the AI too much. To the extent that a (current) reinforcement learning system can be said to “have goals”, the goal is to maximize reward, so wireheading actually is furthering its current goal. It might be that in the future the systems we design are more analogous to humans and then such an approach might be useful.
I think this is anthropomorphizing the AI too much. To the extent that a (current) reinforcement learning system can be said to “have goals”, the goal is to maximize reward, so wireheading actually is furthering its current goal. It might be that in the future the systems we design are more analogous to humans and then such an approach might be useful.