I'm studying the effects of importance sampling on the behaviour that an RL
agent learns,
because I want to find out whether it can lead to undesirable outcomes
in order to help my reader understand whether importance sampling can
solve the problem of widely varying rewards in reward engineering.
rmoehn comments on Which of these five AI alignment research projects ideas are no good?