I dunno, this seems like the sort of thing LLMs would be quite unreliable about—e.g. they’re real bad at introspective questions like “How did you get the answer to this math problem?” They are not model-based, let alone self-modeling, in the way that encourages generalizing to introspection.
I agree and the linked analysis agrees too. LLMs do not have the same feedback mechanisms for learning such state descriptions. But something like “feelings of confidence” is arguably something the model could represent.
I dunno, this seems like the sort of thing LLMs would be quite unreliable about—e.g. they’re real bad at introspective questions like “How did you get the answer to this math problem?” They are not model-based, let alone self-modeling, in the way that encourages generalizing to introspection.
I agree and the linked analysis agrees too. LLMs do not have the same feedback mechanisms for learning such state descriptions. But something like “feelings of confidence” is arguably something the model could represent.