Victor Gillioz comments on Recontextualization Mitigates Specification Gaming Without Modifying the Specification

Victor Gillioz 21 Oct 2025 15:07 UTC
3 points
0
That could be an interesting variation! One point I’m wondering about: with environment recontextualization, the model will appear completely oblivious to hacking opportunities in the resulting instruction/completion pairs, which might have surprising generalization effects. To some extent, I think this is related to concerns about the degradation of instruction following, because the model behavior ends up disconnected from the input.