Rohin Shah comments on Reward model hacking as a challenge for reward learning