This is a really fascinating paper. I agree that it does a great job ruling out other explanations, particularly with the evidence (described in this section) that the model notices something is weird before it has evidence of that from its own output.
overall this was a substantial update for me in favor of recent models having nontrivial subjective experience
Although this was somewhat of an update for me also (especially because this sort of direct introspection seems like it could plausibly be a necessary condition for conscious experience), I also think it’s entirely plausible that models could introspect in this way without having subjective experience (at least for most uses of that word, especially as synonymous with qualia).
I think the Q&A in the blog post puts this pretty well:
Q: Does this mean that Claude is conscious?
Short answer: our results don’t tell us whether Claude (or any other AI system) might be conscious.
Long answer: the philosophical question of machine consciousness is complex and contested, and different theories of consciousness would interpret our findings very differently. Some philosophical frameworks place great importance on introspection as a component of consciousness, while others don’t.
This is a really fascinating paper. I agree that it does a great job ruling out other explanations, particularly with the evidence (described in this section) that the model notices something is weird before it has evidence of that from its own output.
Although this was somewhat of an update for me also (especially because this sort of direct introspection seems like it could plausibly be a necessary condition for conscious experience), I also think it’s entirely plausible that models could introspect in this way without having subjective experience (at least for most uses of that word, especially as synonymous with qualia).
I think the Q&A in the blog post puts this pretty well:
(further discussion elided, see post for more)