[Question] Is it rational to modify one’s utility function?

Rationality is often informally defined by means-end reasoning or utility maximization. However, this idea becomes less clear when faced with the option of modifying one’s own utility function. Does rationality prescribe avoiding any change to one’s current utility function because such a change would obviously reduce expected utility under the current function, or does it prescribe taking actions which result in the highest utility by whatever means necessary, in which case a change would be rational iff the new utility function yields higher expected utility given known background info about the world?

This is obviously relevant to AI alignment, where one concern is that AI may hack their own utility functions and another concern is that they may prevent humans from modifying them (or shutting them off) due to the risk to their current goals. It’s also relevant to questions of human rationality, where, on the one hand, we imagine that Ghandi would not take a pill that makes him want to murder people, but on the other hand, we regularly believe that unhappy people should change their own psychology and goals to be more happy.

No comments.