Alex Duffy comments on Hidden Role Games as a Trusted Model Eval

Alex Duffy 17 Mar 2026 0:02 UTC
1 point
0
This is such a great in depth breakdown!!
I’m obsessed with this space, you can simulate so many interesting experiments with games.
We did a whole breakdown with Diplomacy, models had dramatically different:
- Rates of betraying allies to win the game
- Sensitivity to the power they were playing
- Ability to handle such long context

Interestingly enough, and maybe unsurprisingly, the harness dramatically impacts behavior of models—but each model handles it differently. We also found that some are much more susceptible to jailbreaks while playing as well.

Would love to chat about this further!