ariaw comments on Steering RL Training: Benchmarking Interventions Against Reward Hacking

ariaw 8 Jan 2026 0:39 UTC
3 points
0
Sorry about that! The repository was updated a few days ago to fix this. Let me know if you have any further issues!
- Shashwat Saxena 10 Jan 2026 16:10 UTC
  1 point
  0
  Parent
  Thanks,
  Also I was running the code on the no_intervention setting, using the command run_rl_training no_intervention
  However, I am seeing almost zero reward hacking in my run:
  Am I doing something wrong here?
  - ariaw 11 Jan 2026 21:17 UTC
    1 point
    0
    Parent
    It’s hard for me to help without more information. I’ve responded to your email asking to send some of the files created by training, I can try to help you debug from there.