Dusto comments on Reward hacking is becoming more sophisticated and deliberate in frontier LLMs

Dusto 25 Apr 2025 0:58 UTC
4 points
0
This came up in a recent interview as well in relation to reward hacking with OpenAI Deep Research. Sounds like they also had similar issues trying to make sure the model didn’t keep trying to hack its way to better search outcomes.