The informal part of your opening sentence really hurts here. Humans don’t have time-consistent (or in many cases self-consistent) utility functions. It’s not clear whether AI could theoretically have such a thing, but let’s presume it’s possible.
The confusion comes in having a utility-maximizing framework to describe “what the agent wants”. If you want to change your utility function, that implies that you don’t want what your current utility function says you want. Which means it’s not actually your utility function.
You can add epicycles here—a meta-utility-function that describes what you want to want, probably at a different level of abstraction. That makes your question sensible, but also trivial—of course your meta-utility function wants to change your utility function to more closely match your meta-goals. But then you have to ask whether you’d ever want to change your meta-function. And you get caught recursing until your stack overflows.
Much simpler and more consistent to say “if you want to change it, it’s not your actual utility function”.
The informal part of your opening sentence really hurts here. Humans don’t have time-consistent (or in many cases self-consistent) utility functions. It’s not clear whether AI could theoretically have such a thing, but let’s presume it’s possible.
The confusion comes in having a utility-maximizing framework to describe “what the agent wants”. If you want to change your utility function, that implies that you don’t want what your current utility function says you want. Which means it’s not actually your utility function.
You can add epicycles here—a meta-utility-function that describes what you want to want, probably at a different level of abstraction. That makes your question sensible, but also trivial—of course your meta-utility function wants to change your utility function to more closely match your meta-goals. But then you have to ask whether you’d ever want to change your meta-function. And you get caught recursing until your stack overflows.
Much simpler and more consistent to say “if you want to change it, it’s not your actual utility function”.