Bronson Schoen comments on Eli’s shortform feed

Bronson Schoen 2 Feb 2026 12:55 UTC
1 point
0

I don’t actually buy this. If the models are trying to reward-hack, and also not get caught, I expect them to trip up sometimes. There should be a long tail of instances of transparent reward hacking.

I’m a bit confused about the assumptions in this section, I thought it was conditioning on no longer seeing reward hacking?