James Hoffend comments on Steering RL Training: Benchmarking Interventions Against Reward Hacking