Freezing the reward seems like the correct answer by definition, since if I am an agent following the utility function R and I have to design a new agent now, then it is rational for me to design the new agent to follow the utility function I am following now (i.e. this action is usually rated as the best according to my current utility function).
Freezing the reward seems like the correct answer by definition, since if I am an agent following the utility function R and I have to design a new agent now, then it is rational for me to design the new agent to follow the utility function I am following now (i.e. this action is usually rated as the best according to my current utility function).