ariana_azarbal comments on Recontextualization Mitigates Specification Gaming Without Modifying the Specification

ariana_azarbal 15 Oct 2025 13:06 UTC
2 points
0
This is a very interesting prompting suggestion, and I’d like to test it! Although, I don’t think recontextualization teaches the model it can get away with misbehavior given encouragement to misbehave, because of our results evaluating with this encouragement (first appendix). Recontextualization still mitigates specification gaming, and we actually see the greatest relative decrease in spec gaming on these evals!