Of course you can’t maximise the expected value of U by maximising the expected value of F (which happens to at this point in time empirically correlate to U) instead. If the AI takes the action at t=0 which maximises the expected integral of Reward from t=0 to {far-future}, it can be accurately described as caring about (and only about) Reward.
If you want to make it care about Utility you have to program it to take the action at t=0 which maximises the expected integral of Utility from t=0 to {far-future}. Maximising “Utility” or [the output of this box which implements Utility, I swear!] is not the same thing. Replace the quotation with the referent, etc. Kind of like optimizing for [happiness, freedom, love, complex fun...] instead of “morality” (because you know what a babyeater would reply if you ask what “right” is).
Of course, I don’t know how you’d actually program AIXI to do that, but it’s what would have to be done...
Of course you can’t maximise the expected value of U by maximising the expected value of F (which happens to at this point in time empirically correlate to U) instead. If the AI takes the action at t=0 which maximises the expected integral of Reward from t=0 to {far-future}, it can be accurately described as caring about (and only about) Reward.
If you want to make it care about Utility you have to program it to take the action at t=0 which maximises the expected integral of Utility from t=0 to {far-future}. Maximising “Utility” or [the output of this box which implements Utility, I swear!] is not the same thing. Replace the quotation with the referent, etc. Kind of like optimizing for [happiness, freedom, love, complex fun...] instead of “morality” (because you know what a babyeater would reply if you ask what “right” is).
Of course, I don’t know how you’d actually program AIXI to do that, but it’s what would have to be done...