I have another question about bounded agents: how would they behave if the expected utility were capped rather than the raw value of the utility? Past a certain point, an AI with a bounded expected utility wouldn’t have an incentive to act in extreme ways to achieve small increases in the expected value of its utility function. But are there still ways in which an AI with a bounded expected utility could be incentivized to restructure the physical world on a massive scale?
Nate Showell
For the AI to take actions to protect its maximized goal function, it would have to allow the goal function to depend on external stimuli in some way that would allow for the possibility of G decreasing. Values of G lower than MAXINT would have to be output when the reinforcement learner predicts that G decreases in the future. Instead of allowing such values, the AI would have to destroy its prediction-making and planning abilities to set G to its global maximum.
The confidence with which the AI predicts the value of G would also become irrelevant after the AI replaces its goal function with MAXINT. The expected value calculation that makes G depend on the confidence is part of what would get overwritten, and if the AI didn’t replace it, G would end up lower than if it did. Hardcoding G also hardcodes the expected utility.
MAXINT just doesn’t have the kind of internal structure that would let it depend on predicted inputs or confidence levels. Encoding such structure into it would allow G to take non-optimal values, so the reinforcement learner wouldn’t do it.
It’s not clear to me why a satisficer would modify itself to become a maximizer when it could instead just hardcode expected utility=MAXINT. Hardcoding expected utility=MAXINT would result in a higher expected utility while also having a shorter description length.