If the AI is going to be flexible enough to be a functional superintelligence, it’s going to be able to question and override built-in preferences.
Not all possible minds have the human trait of thinking about preferences as truth-apt propositions. A straightforward Bayesian expected utility maximizer isn’t going to question its utility function; doing so has negative expected utility under almost all circumstances, and it doesn’t have the dynamics that make “are my desires correct?” seem like a sensible thought. Neither do lots of other possible architectures for optimization processes.
Not all possible minds have the human trait of thinking about preferences as truth-apt propositions. A straightforward Bayesian expected utility maximizer isn’t going to question its utility function; doing so has negative expected utility under almost all circumstances, and it doesn’t have the dynamics that make “are my desires correct?” seem like a sensible thought. Neither do lots of other possible architectures for optimization processes.