habryka comments on Reward function learning: the learning process