‘Second, some of these capabilities are quite far from paradigm human introspection. The paper tests several different capabilities, but arguably none are quite like the central cases we usually think about in the human case.’
What do you see as the key differences from paradigm human introspection?
Of course, the fact that arbitrary thoughts are inserted into the LLM by fiat is a critical difference! But once we accept that core premise of the experiment, the capabilities tested seem to have the central features of human introspection, at least when considered collectively.
I won’t pretend to much familiarity with the philosophical literature on introspection (much less on AI introspection!), but when I look at the Stanford Encyclopedia of Philosophy (https://plato.stanford.edu/entries/introspection/#NeceFeatIntrProc) it lists three ~universally agreed necessary qualities of introspection, of which all three seem pretty clearly met by this experiment.
In talking with a number of people about this paper, it’s become clear that people’s intuitions differ on the central usage of ‘introspection’. For me and at least some others, its primary meaning is something like ‘accessing and reporting on current internal state’, and as I see it, that’s exactly what’s being tested by this set of experiments.
Some more informal comments, this time copied from a comment I left on @Robbo’s post on the paper, ‘Can AI systems introspect?’.