eggsyntax comments on eggsyntax’s Shortform

eggsyntax 3 Nov 2025 1:43 UTC
2 points
0
Some more informal comments, this time copied from a comment I left on @Robbo’s post on the paper, ‘Can AI systems introspect?’.
‘Second, some of these capabilities are quite far from paradigm human introspection. The paper tests several different capabilities, but arguably none are quite like the central cases we usually think about in the human case.’
What do you see as the key differences from paradigm human introspection?
Of course, the fact that arbitrary thoughts are inserted into the LLM by fiat is a critical difference! But once we accept that core premise of the experiment, the capabilities tested seem to have the central features of human introspection, at least when considered collectively.
I won’t pretend to much familiarity with the philosophical literature on introspection (much less on AI introspection!), but when I look at the Stanford Encyclopedia of Philosophy (https://plato.stanford.edu/entries/introspection/#NeceFeatIntrProc) it lists three ~universally agreed necessary qualities of introspection, of which all three seem pretty clearly met by this experiment.
In talking with a number of people about this paper, it’s become clear that people’s intuitions differ on the central usage of ‘introspection’. For me and at least some others, its primary meaning is something like ‘accessing and reporting on current internal state’, and as I see it, that’s exactly what’s being tested by this set of experiments.
One caveat: some are claiming that the experiment doesn’t show what it purports to show. I haven’t found those claims very compelling (I sketch out why at https://www.lesswrong.com/posts/Lm7yi4uq9eZmueouS/eggsyntax-s-shortform?commentId=pEaQWb6oRqibWuFrM), but they’re not strictly ruled out. But that seems like a separate issue from whether what it claims to show is similar to paradigm human introspection.