Casey Barkan comments on Do LLMs know what they’re capable of? Why this matters for AI safety, and initial findings

Casey Barkan 15 Jul 2025 5:44 UTC
4 points
2
Agreed that successful sandbagging will likely require Schelling coordination, and my guess is that this will be extremely difficult for models to pull off! Great to see that you’re investigating this topic.
- Graeme Ford 16 Jul 2025 1:55 UTC
  1 point
  0
  Parent
  Indeed, current models are terrible at this! Still, worth keeping an eye on it, as it would complicate dangerous capability evals quite a bit should it emerge.