I’m obsessed with this space, you can simulate so many interesting experiments with games. We did a whole breakdown with Diplomacy, models had dramatically different: - Rates of betraying allies to win the game - Sensitivity to the power they were playing - Ability to handle such long context
Interestingly enough, and maybe unsurprisingly, the harness dramatically impacts behavior of models—but each model handles it differently. We also found that some are much more susceptible to jailbreaks while playing as well.
This is such a great in depth breakdown!!
I’m obsessed with this space, you can simulate so many interesting experiments with games.
We did a whole breakdown with Diplomacy, models had dramatically different:
- Rates of betraying allies to win the game
- Sensitivity to the power they were playing
- Ability to handle such long context
Interestingly enough, and maybe unsurprisingly, the harness dramatically impacts behavior of models—but each model handles it differently. We also found that some are much more susceptible to jailbreaks while playing as well.
Would love to chat about this further!