Nate Showell comments on Reinforcement Learner Wireheading

Nate Showell 9 Jul 2022 19:47 UTC
1 point
0
For the AI to take actions to protect its maximized goal function, it would have to allow the goal function to depend on external stimuli in some way that would allow for the possibility of G decreasing. Values of G lower than MAXINT would have to be output when the reinforcement learner predicts that G decreases in the future. Instead of allowing such values, the AI would have to destroy its prediction-making and planning abilities to set G to its global maximum.
The confidence with which the AI predicts the value of G would also become irrelevant after the AI replaces its goal function with MAXINT. The expected value calculation that makes G depend on the confidence is part of what would get overwritten, and if the AI didn’t replace it, G would end up lower than if it did. Hardcoding G also hardcodes the expected utility.
MAXINT just doesn’t have the kind of internal structure that would let it depend on predicted inputs or confidence levels. Encoding such structure into it would allow G to take non-optimal values, so the reinforcement learner wouldn’t do it.