Kei Nishimura-Gasparian comments on Reward hacking is becoming more sophisticated and deliberate in frontier LLMs

Kei Nishimura-Gasparian 25 Apr 2025 16:34 UTC
1 point
0
Thanks for making the distinction! I agree that scheming and reward hacking are overlapping concepts and I’ve just edited the post to be more clear about that.

I think your model of how scheming and reward hacking are likely to coincide in the future makes a lot of sense. It also seems possible that sufficiently strong monitoring and oversight systems (in certain domains) will make it impossible for models to reward hack without scheming.