RSS

ariaw

Karma: 50

MATS 9.0 Scholar with Neel Nanda
https://​​ariahw.github.io

Steer­ing RL Train­ing: Bench­mark­ing In­ter­ven­tions Against Re­ward Hacking

29 Dec 2025 21:55 UTC
60 points
10 comments19 min readLW link