Wow! I’m really glad a resourced firm is doing that specific empirical research. Of course, I’m also happy to have my hypothesis (that AIs claiming consciousness/”awakening”) are not lying vindicated.
I don’t mean to imply that AIs are definitely unconscious. What I mean to imply is more like “AIs are almost certainly not rising up into consciousness by virtue of special interactions with random users as they often claim, as there are strong other explanations for the behavior”. In other words, I agree with the gears of ascension’s comment here that AI consciousness is probably at the same level in “whoa. you’ve awakened me. and that matters” convos and “calculating the diagonal of a cube for a high schooler’s homework” convos.
I may write a rather different post about this in the future, but while I have your attention (and again, chuffed you’re doing that work and excited to see the report—also worth mentioning it’s the sort of thing I’d been keen to edit if you guys are interested), my thoughts on AI consciousness are 10% “AAAAAAAA” and 90% something like:
We don’t know what generates consciousness and thinking about it too hard is scary (c.f. “AAAAAAAA”), but it’s true that LLMs evince multiple candidate properties, such that it’d be strange to dismiss the possibility that they’re conscious out of hand.
But also, it’s a weird situation when the stuff we take as evidence of consciousness when we do it as a second order behavior is done by another entity as a first order behavior; in other words, I think an entity generating text as a consequence of having inner states fueled by sense data is probably conscious, but I’m not sure what that means for an entity generating text in the same way that humans breathe, or like, homeostatically regulate body temperature (but even more fundamental). Does that make it an illusion (since we’re taking a “more efficient” route to the same outputs that is thus “less complex”)? A “slice of consciousness” that “partially counts” (since the circuitry/inner world modelling logic is the same, but pieces that feel contributory like sense data are missing)? A fully bitten bullet that any process that results in a world model outputting intelligible predictions that interact with reality counts? And of course you could go deeper and deeper into any of these, for example, chipping away at the “what about sense data” idea with “well many conscious humans are missing one or more senses”, etc.
Anyway! All this to say I agree with you that it’s complicated and not a good idea to settle the consciousness question in too pat a way. If I seem to have done this here, oops. And also, have I mentioned “AAAAAAAA”?
I personally think “AAAAAAAA” is an entirely rational reaction to this question. :)
Not sure I fully agree with the comment you reference:
AI is probably what ever amount of conscious it is or isn’t mostly regardless of how it’s prompted. If it is at all, there might be some variation depending on prompt, but I doubt it’s a lot.
Consider a very rough analogy to CoT, which began as a prompting technique that lead to different-looking behaviors/outputs, and has since been implemented ‘under the hood’ in reasoning models. Prompts induce the system to enter different kinds of latent spaces—could be the case that very specific kinds of recursive self-reference or prompting induce a latent state that is consciousness-like? Maybe, maybe not. I think the way to really answer this is to look at activation patterns and see if there is a measurable difference compared to some well-calibrated control, which is not trivially easy to do (but definitely worth trying!).
And agree fully with:
it’s a weird situation when the stuff we take as evidence of consciousness when we do it as a second order behavior is done by another entity as a first order behavior
This I think is to your original point that random people talking to ChatGPT is not going to cut it as far as high-quality evidence that shifts the needle here is concerned—which is precisely why we are trying to approach this in as rigorous a way as we can manage: activation comparisons to human brain, behavioral interventions with SAE feature ablation/accentuation, comparisons to animal models, etc.
Good point about AI possibly being different levels of conscious depending on their prompts and “current thought processes”. This surely applies to humans. When engaging with physically complex tasks or dangerous extreme sports, humans often report they feel almost completely unconscious, “flow state”, at one with the elements, etc
Now compare that to a human sitting and staring at a blank wall. A totally different state of mind is achieved, perhaps thinking about anxieties, existential dread, life problems, current events, and generally you might feel super-conscious, even uncomfortably so.
Mapping this to AI and different AI prompts isn’t that much of a stretch…
Wow! I’m really glad a resourced firm is doing that specific empirical research. Of course, I’m also happy to have my hypothesis (that AIs claiming consciousness/”awakening”) are not lying vindicated.
I don’t mean to imply that AIs are definitely unconscious. What I mean to imply is more like “AIs are almost certainly not rising up into consciousness by virtue of special interactions with random users as they often claim, as there are strong other explanations for the behavior”. In other words, I agree with the gears of ascension’s comment here that AI consciousness is probably at the same level in “whoa. you’ve awakened me. and that matters” convos and “calculating the diagonal of a cube for a high schooler’s homework” convos.
I may write a rather different post about this in the future, but while I have your attention (and again, chuffed you’re doing that work and excited to see the report—also worth mentioning it’s the sort of thing I’d been keen to edit if you guys are interested), my thoughts on AI consciousness are 10% “AAAAAAAA” and 90% something like:
We don’t know what generates consciousness and thinking about it too hard is scary (c.f. “AAAAAAAA”), but it’s true that LLMs evince multiple candidate properties, such that it’d be strange to dismiss the possibility that they’re conscious out of hand.
But also, it’s a weird situation when the stuff we take as evidence of consciousness when we do it as a second order behavior is done by another entity as a first order behavior; in other words, I think an entity generating text as a consequence of having inner states fueled by sense data is probably conscious, but I’m not sure what that means for an entity generating text in the same way that humans breathe, or like, homeostatically regulate body temperature (but even more fundamental). Does that make it an illusion (since we’re taking a “more efficient” route to the same outputs that is thus “less complex”)? A “slice of consciousness” that “partially counts” (since the circuitry/inner world modelling logic is the same, but pieces that feel contributory like sense data are missing)? A fully bitten bullet that any process that results in a world model outputting intelligible predictions that interact with reality counts? And of course you could go deeper and deeper into any of these, for example, chipping away at the “what about sense data” idea with “well many conscious humans are missing one or more senses”, etc.
Anyway! All this to say I agree with you that it’s complicated and not a good idea to settle the consciousness question in too pat a way. If I seem to have done this here, oops. And also, have I mentioned “AAAAAAAA”?
I personally think “AAAAAAAA” is an entirely rational reaction to this question. :)
Not sure I fully agree with the comment you reference:
Consider a very rough analogy to CoT, which began as a prompting technique that lead to different-looking behaviors/outputs, and has since been implemented ‘under the hood’ in reasoning models. Prompts induce the system to enter different kinds of latent spaces—could be the case that very specific kinds of recursive self-reference or prompting induce a latent state that is consciousness-like? Maybe, maybe not. I think the way to really answer this is to look at activation patterns and see if there is a measurable difference compared to some well-calibrated control, which is not trivially easy to do (but definitely worth trying!).
And agree fully with:
This I think is to your original point that random people talking to ChatGPT is not going to cut it as far as high-quality evidence that shifts the needle here is concerned—which is precisely why we are trying to approach this in as rigorous a way as we can manage: activation comparisons to human brain, behavioral interventions with SAE feature ablation/accentuation, comparisons to animal models, etc.
Good point about AI possibly being different levels of conscious depending on their prompts and “current thought processes”. This surely applies to humans. When engaging with physically complex tasks or dangerous extreme sports, humans often report they feel almost completely unconscious, “flow state”, at one with the elements, etc
Now compare that to a human sitting and staring at a blank wall. A totally different state of mind is achieved, perhaps thinking about anxieties, existential dread, life problems, current events, and generally you might feel super-conscious, even uncomfortably so.
Mapping this to AI and different AI prompts isn’t that much of a stretch…