Good point. May I ask, is “explicit utility function” standard terminology, and if yes, is there a good reference to it somewhere that explains it? It took me a long time until I realized the interesting difference between humans, who engage in moral philosophy and often can’t tell you what their goals are, and my model of paperclippers. I also think that not understanding this difference is a big reason why people don’t understand the orthagonality thesis.
They’re often called explicit goals not utility functions. Utility function is a terminology from a very specific moral philosophy.
Also note that the orthogonality thesis depends on an explicit goal structure. Without such an architecture it should be called the orthogonality hypothesis.
Good point. May I ask, is “explicit utility function” standard terminology, and if yes, is there a good reference to it somewhere that explains it? It took me a long time until I realized the interesting difference between humans, who engage in moral philosophy and often can’t tell you what their goals are, and my model of paperclippers. I also think that not understanding this difference is a big reason why people don’t understand the orthagonality thesis.
No, I do not believe that it is standard terminology, though you can find a decent reference here.
They’re often called explicit goals not utility functions. Utility function is a terminology from a very specific moral philosophy.
Also note that the orthogonality thesis depends on an explicit goal structure. Without such an architecture it should be called the orthogonality hypothesis.