For clarification, is it actually relatively intuitive thought that an agent will act more conservatively? Who has stated as such? It seems this would only be the case if it had a deeper utility function that placed great weight on it ‘discovering’ its other utility function.
It seems this would only be the case if it had a deeper utility function that placed great weight on it ‘discovering’ its other utility function.
This isn’t actually necessary. If it has a prior over utility functions and some way of observing evidence about which one is real, you can construct the policy which maximizes expected utility in the following sense: it imagines a utility function is sampled from the set of possibilities according to its prior probabilities, and it imagines that utility function is what it’s scored on. This naturally gives the instrumental goal of trying to learn about which utility function was sampled (i.e. which is the real utility function), since some observations will provide evidence about which one was sampled.
For clarification, is it actually relatively intuitive thought that an agent will act more conservatively? Who has stated as such? It seems this would only be the case if it had a deeper utility function that placed great weight on it ‘discovering’ its other utility function.
This isn’t actually necessary. If it has a prior over utility functions and some way of observing evidence about which one is real, you can construct the policy which maximizes expected utility in the following sense: it imagines a utility function is sampled from the set of possibilities according to its prior probabilities, and it imagines that utility function is what it’s scored on. This naturally gives the instrumental goal of trying to learn about which utility function was sampled (i.e. which is the real utility function), since some observations will provide evidence about which one was sampled.