Steven Byrnes comments on Steve Byrnes’s Shortform

Steven Byrnes 29 Apr 2025 19:26 UTC
LW: 6 AF: 3
5
AF
Thanks!
I’m now much more sympathetic to a claim like “the reason that o3 lies and cheats is (perhaps) because some reward-hacking happened during its RL post-training”.
But I still think it’s wrong for a customer to say “Hey I gave o3 this programming problem, and it reward-hacked by editing the unit tests.”
- Cole Wyeth 29 Apr 2025 21:19 UTC
  4 points
  0
  Parent
  Yes, you’re technically right.