LorenzoPacchiardi comments on Do LLMs know what they’re capable of? Why this matters for AI safety, and initial findings

LorenzoPacchiardi 23 Sep 2025 11:46 UTC
4 points
0
Hi, thanks for this interesting work. You may also be interested in our new work where we investigate whether internal linear probes (before an answer is produced) capture whether a model is going to answer correctly: https://www.lesswrong.com/posts/KwYpFHAJrh6C84ShD/no-answer-needed-predicting-llm-answer-accuracy-from
We also compare that with verbalised self-confidence and we find internals have more predictive power, so potentially you can apply internals to your setup