Agreed. Humans are constantly optimizing a reward function, but it sort of ‘changes’ from moment to moment in a near-focal way, so it often looks irrational or self-defeating, but once you know what the reward function is, the goal-directedness is easy to see too.
Doesn’t this become tautological? If the reward function changes from moment to moment, then the reward function can just be whatever explains the behaviour.
Would humans, or organizations of humans, make more progress towards whatever goals they have, if they modified themselves to become a utility maximizer? If so, why don’t they? If not, why would an AGI?
What would it mean to modify oneself to become a utility maximizer? What would it mean for the US, for example? The only meaning I can imagine is that one individual—for the sake of argument we assume that this individual is already an utility maximizer—enforces his will on everyone else. Would that help the US make more progress towards its goals? Do countries that are closer to utility maximizers, like North Korea, make more progress towards their goals?