Agreed that successful sandbagging will likely require Schelling coordination, and my guess is that this will be extremely difficult for models to pull off! Great to see that you’re investigating this topic.
Indeed, current models are terrible at this! Still, worth keeping an eye on it, as it would complicate dangerous capability evals quite a bit should it emerge.
Agreed that successful sandbagging will likely require Schelling coordination, and my guess is that this will be extremely difficult for models to pull off! Great to see that you’re investigating this topic.
Indeed, current models are terrible at this! Still, worth keeping an eye on it, as it would complicate dangerous capability evals quite a bit should it emerge.