Christopher Ackerman comments on How Self-Aware Are LLMs?

Christopher Ackerman 17 Jul 2025 2:33 UTC
7 points
0
Thanks!
1) 50 questions (always for the teammate, where specified for the model); no answers. Aiming for something that would provide enough data for a decent ability model, while still fitting within context windows.
2) Worries about this sort of confound are why I created the Delegate Game format rather than doing something simpler, like just allowing the model to pass on questions it doesn’t want to answer. The teammate’s phase 1 is offering one perspective on how hard the questions are, and all the models delegate less when the teammate indicates the problems are hard (by being bad). The GPQA questions have human-rated difficulty, which is sometimes predictive of the model delegating, but the model’s prior performance (and prior entropy, where available) is a better predictor of which answers it will delegate.
3) Yeah, I considered (but didn’t try) it. I wanted to encourage the model to actively think about what it might answer, and not default to “this looks hard/easy”. Might be interesting to try. These models are all pretty smart, and didn’t have any difficulty with the additional option (which I set up in the system prompt as well).