Incidentally, I think reward hacking has gone down as a result of people improving techniques, despite capabilities increasing. I believe this because of anecdotal reports and also graphs like the one from the Anthropic model card for Opus and Sonnet 4:
Incidentally, I think reward hacking has gone down as a result of people improving techniques, despite capabilities increasing. I believe this because of anecdotal reports and also graphs like the one from the Anthropic model card for Opus and Sonnet 4: