My guess is you changed your mind on this? It seems pretty clear to me that GPT 5.5 and Opus 4.7 are much more reward-hacky than their predecessors, just substantially better at it (or like, they are much stronger apparent-success seekers, which IMO clearly was the reward they were trained on).
Noting again for the record that I would not be surprised if at some future stage, the model figures out what humans want to hear and see, errors and all, and then there is an apparent sudden amazing success with alignment.
My guess is you changed your mind on this? It seems pretty clear to me that GPT 5.5 and Opus 4.7 are much more reward-hacky than their predecessors, just substantially better at it (or like, they are much stronger apparent-success seekers, which IMO clearly was the reward they were trained on).
Noting again for the record that I would not be surprised if at some future stage, the model figures out what humans want to hear and see, errors and all, and then there is an apparent sudden amazing success with alignment.