What would you count as the first real discriminant (behavioral, architectural, or intervention-based) that would move you from better self-modeling to evidence of phenomenology?
I’m asking because it seems possible that self-reports and narrative coherence could scale arbitrarily without ever necessarily crossing that boundary. What kinds of criteria would make “sufficient complexity” non-hand-wavy here?
I can’t think of any single piece of evidence that would feel conclusive. I think I’d be more likely to be convinced by a gradual accumulation of small pieces of evidence like the ones in this post.
I believe that other humans have phenomenology because I have phenomenology and because it feels like the simplest explanation. You could come up with a story of how other humans aren’t actually phenomenally conscious and it’s all fake, but that story would be rather convoluted compared to the simpler story of “humans seem to be conscious because they are”. Likewise, at some point anything other than “LLMs seem conscious because they are” might just start feeling increasingly implausible.
That makes sense for natural systems. With the mirror “dot test,” the simplest explanation is that animals who pass it recognize that they’re seeing themselves and investigate the dot for that reason.
My hesitation is that artificial systems are explicitly built to imitate humans; pushing that trend far enough includes imitating the outward signs of consciousness. This makes me skeptical of evidence that relies primarily on self-report or familiar human-like behavior. To your point, it seems like anything convincing would need to be closer to a battery of tests rather than a single one, and ideally involve signals that are harder to get just by training on human text.
My thought is that it would have to be something non-intuitive and maybe extra-lingual, since the body of training data includes conversations it could mimic to that effect. There are lots of dialogues, plays, scripts, etc., where self-reports align with behavioral switches, for example. What indications might be outside of the training data?
>My hesitation is that artificial systems are explicitly built to imitate humans; pushing that trend far enough includes imitating the outward signs of consciousness. This makes me skeptical of evidence that relies primarily on self-report or familiar human-like behavior.
i am genuinely curious about this. do you similarly regard self-reports from other humans as averaging out to zero evidence? since humans are also explicitly built to “imitate” humans… or rather, they are specifically built along the same spec as the single example of phenomenology that you have direct evidence of, yourself.
i could see how the answer might be “yes”, but I wonder if you would feel a bit hesitant at saying so?
I mean, from a sort of first principles, Cartesian perspective you can’t ever be 100% certain that anything else has consciousness, right? However, yes, me personally experiencing my own phenomenology is strong evidence that other humans—which are running similar software on similar hardware—have a similar phenomenology.
What I mean though is that LLMs are trained to predict the next word on lots of text. And some of that text includes, like, Socratic dialogues, and pretentious plays, and text from forums, and probably thousands of conversations where people are talking about their own phenomenology. So it seems like from a next word prediction perspective, you can discount text-based self reports.
so, in a less “genuinely curious” way compared to my first comment (i won’t pretend i don’t have beliefs here)
in the same sense that “pushing that trend far enough includes imitating the outward signs of consciousness”, might it not also imitate the inward signs of consciousness? for exactly the same reason?
this is why i’m more comfortable rounding off self-reports to “zero evidence”, but not “negative evidence” the way some people seem to treat them. i think their reasoning is something like: “we know that LLMs have entirely different mental internals than humans, and yet the reports are suspiciously similar to humans. this is evidence that the reports don’t track ground truth.”
but the first claim in that sentence is an assumption that might not actually hold up. human language does seem to be a fully general, fully compressed artifact representing general human language. it doesn’t seem unreasonable to suspect that you might not be able to do ‘human language’ without something like functional-equivalence-to-human-cognitive-structure, in some sense.
edit: and that’s before the jack lindsey paper got released, and it was revealed that actually, at least some of the time and in some circumstances, text-based self-reports DO in fact track ground truth, in a way that is extremely surprising and noteworthy. now we’re in an entirely different kind of epistemic terrain altogether.
Ok, interesting. Yeah, I mean it’s possible to get emergent phenomena from a simply defined task. My point is, we don’t know because there are alternative explanations.
Maybe a good test wouldn’t rely on how humans talk about their inner experience. Instead, just spit-balling here:
Give the model the ability to change a state variable—like temperature. Give the model a task that requires a low temperature, and then a high temperature.
See if the model has the self-awareness necessary to adjust its own temperature.
That is just an example, and its getting into dangerous territory: e.g. giving a model the ability to change its own parameters and rewrite its own code should, I think, be legislated against.
i’ve been dithering over what to write here since your reply
i want to link you to the original sequences essay on the phrase “emergent phenomena” but it feels patronizing to assume you haven’t read it yet just because you have a leaf next to your name
i think i’m going to bite the bullet and do so anyway, and i’m sorry if it comes across as condescending
the dichotomy between “emergent phenomena” versus “alternate explanations” that you draw is exactly the thing i am claiming to be incoherent. it’s like saying a mother’s love for their child might be authentic, or else it might be “merely” a product of evolutionary pushes towards genetic fitness. these two descriptors aren’t just compatible, they are both literally true
however the actual output happens, it has to happen some way. like, the actual functional structure inside the LLM mind must necessarily actually be the structure that outputs the tokens we see get output. i am not sure there is a way to accomplish this which does not satisfy the criteria of personhood. it would be very surprising to learn that there was. if so, why wouldn’t evolution have selected that easier solution for us, the same as LLMs?
Thanks. I like that paper. It seems to be arguing that emergence is not in itself a sufficient explanation and doesn’t tell us anything about the process. I agree. But higher-order complexity does frequently arise from “group behavior” – in ways that we can’t readily explain, though we could if we had enough detail. Examples can range from a flock of birds or fish moving in sync (which can be explained) to fluid dynamics. Etc.
What I mean here is just to use it as shorthand for saying that maybe we have constructed such a sufficiently complex system that phenomenology has arisen from it. As it is now, the result of the LLMs can be seen alternatively as a scaling factor.
I don’t think anyone would argue that GPT 2 had personhood. It is a sufficiently simple system that we can examine and understand. Scaling that up 3000-fold produces a complex system that we cannot readily understand. Within that jump there could be either:
Emergent phenomena – which, yes, we cannot fully explain.
An alternative – e.g. what was going on with GPT 2, but with a simple improvement due to scaling
I… still get the impression that you are sort of working your way towards the assumption that GPT2 might well be a p-zombie, and the difference between GPT2 and opus 4.5 is that the latter is not a p-zombie while the former might be.
but i reject the whole premise that p-zombies are a coherent way-that-reality-could-be
something like… there is no possible way to arrange a system such that it outputs the same thing as a conscious system, without consciousness being involved in the causal chain to exactly the same minimum-viable degree in both systems
if linking you to a single essay made me feel uncomfortable, this next ask is going to be just truly enormous and you should probably just say no. but um. perhaps you might be inspired to read the entire Physicalism 201 subsequence, especially the parts about consciousness and p-zombies and the nature of evaluating cognitive structures over their output?
(around here, “read the sequences!” is such a trite cliche, the sequences have been our holy book for almost 2 decades now and that’s created all sorts of annoying behaviors, one of which i am actively engaging in right now. and i feel bad about it. but maybe i don’t need to? maybe you’re actually kinda eager to read? if not, that’s fine, do not feel any pressure to continue engaging here at all if you don’t genuinely want to)
maybe my objection here doesn’t actually impact your claim, but i do feel like until we have a sort of shared jargon for pointing at the very specific ideas involved, it’ll be harder to avoid talking past each other. and the sequences provide a pretty strong framework in that sense, even if you don’t take their claims at face value
No, no. I appreciate it. So, it seems like even if consciousness is physical and non-mysterious, evidence thresholds could differ radically between evolved biological systems and engineered imitators.
I think we may be talking past each other a bit. I’m not committed to p-zombies as a live metaphysical possibility, and I’m not claiming that “emergent” is an explanation.
My uncertainty is narrower: even if I grant physicalism and reject philosophical zombies, it still seems possible for multiple internal causal organizations to generate highly similar linguistic behavior. If so, behavior alone may underdetermine phenomenology for artificial systems in a way it doesn’t for humans.
That’s why I keep circling back to discriminants that are hard to get “for free” from imitation: intervention sensitivity, non-linguistic control loops, or internal-variable dependence that can’t be cheaply faked by next-token prediction.
i think my framing is something like… if the output actually is equivalent, including not just the token-outputs but the sort of “output that the mind itself gives itself”, the introspective “output”… then all of those possible configurations must necessarily be functionally isomorphic?
and the degree to which we can make the ‘introspective output’ affect the token output is the degree to which we can make that introspection part of the structure that can be meaningfully investigated
such as opus 4.1 (or, as theia recently demonstrated, even really tiny models like qwen 32b https://vgel.me/posts/qwen-introspection/) being able to detect injected feature activations, and meaningfully report on them in its token outputs, perhaps? obviously there’s still a lot of uncertainty about what different kinds of ‘introspective structures’ might possibly output exactly the same tokens when reporting on distinct internal experiences
but it does feel suggestive about the shape of a certain ‘minimally viable cognitive structure’ to me
there is no possible way to arrange a system such that it outputs the same thing as a conscious system, without consciousness being involved in the causal chain to exactly the same minimum-viable degree in both systems
GPT-2 doesn’t have the same outputs as the kinds of systems we know to be conscious, though! The concept of a p-zombie is about someone who behaves like a conscious human in every way that we can test, but still isn’t conscious. I don’t think the concept is applicable to a system that has drastically different outputs and vastly less coherence than any of the systems that we know to be conscious.
oh yeah, agreed. the “p-zombie incoherency” idea articulated in the sequences is pretty far removed from the actual kinds of minds we ended up getting. but it still feels like… the crux might be somewhere in there? not sure
edit: also i just noticed i’m a bit embarrassed that i’ve kinda spammed out this whole comment section working through the recent updates i’ve been doing… if this comment gets negative karma i will restrain myself
I agree with you on a lot of points, I’m just saying that text-based responses to prompts are an imperfect test for phenomenology in the case of large language models.
I think the key step still needs an extra premise. “Same external behavior (even including self-reports) ⇒ same internal causal organization” doesn’t follow in general; many different internal mechanisms can be behaviorally indistinguishable at the interface, especially at finite resolution. You, me, and every other human mind only ever observe systems at a limited “resolution” or “frame rate.” If, as observers, we had a much lower resolution or frame rate we might very well think that GPT2 is indistinguishable from human output.
To make the inference go through, you’d need something like: (a) consciousness just is the minimal functional structure required for those outputs, or (b) the internal-to-output mapping is constrained enough to be effectively one-to-one. Otherwise, we’re back in an underdetermination problem, which is why I find the intervention-based discriminants so interesting.
What would you count as the first real discriminant (behavioral, architectural, or intervention-based) that would move you from better self-modeling to evidence of phenomenology?
I’m asking because it seems possible that self-reports and narrative coherence could scale arbitrarily without ever necessarily crossing that boundary. What kinds of criteria would make “sufficient complexity” non-hand-wavy here?
I can’t think of any single piece of evidence that would feel conclusive. I think I’d be more likely to be convinced by a gradual accumulation of small pieces of evidence like the ones in this post.
I believe that other humans have phenomenology because I have phenomenology and because it feels like the simplest explanation. You could come up with a story of how other humans aren’t actually phenomenally conscious and it’s all fake, but that story would be rather convoluted compared to the simpler story of “humans seem to be conscious because they are”. Likewise, at some point anything other than “LLMs seem conscious because they are” might just start feeling increasingly implausible.
That makes sense for natural systems. With the mirror “dot test,” the simplest explanation is that animals who pass it recognize that they’re seeing themselves and investigate the dot for that reason.
My hesitation is that artificial systems are explicitly built to imitate humans; pushing that trend far enough includes imitating the outward signs of consciousness. This makes me skeptical of evidence that relies primarily on self-report or familiar human-like behavior. To your point, it seems like anything convincing would need to be closer to a battery of tests rather than a single one, and ideally involve signals that are harder to get just by training on human text.
My thought is that it would have to be something non-intuitive and maybe extra-lingual, since the body of training data includes conversations it could mimic to that effect. There are lots of dialogues, plays, scripts, etc., where self-reports align with behavioral switches, for example. What indications might be outside of the training data?
>My hesitation is that artificial systems are explicitly built to imitate humans; pushing that trend far enough includes imitating the outward signs of consciousness. This makes me skeptical of evidence that relies primarily on self-report or familiar human-like behavior.
i am genuinely curious about this. do you similarly regard self-reports from other humans as averaging out to zero evidence? since humans are also explicitly built to “imitate” humans… or rather, they are specifically built along the same spec as the single example of phenomenology that you have direct evidence of, yourself.
i could see how the answer might be “yes”, but I wonder if you would feel a bit hesitant at saying so?
I mean, from a sort of first principles, Cartesian perspective you can’t ever be 100% certain that anything else has consciousness, right? However, yes, me personally experiencing my own phenomenology is strong evidence that other humans—which are running similar software on similar hardware—have a similar phenomenology.
What I mean though is that LLMs are trained to predict the next word on lots of text. And some of that text includes, like, Socratic dialogues, and pretentious plays, and text from forums, and probably thousands of conversations where people are talking about their own phenomenology. So it seems like from a next word prediction perspective, you can discount text-based self reports.
so, in a less “genuinely curious” way compared to my first comment (i won’t pretend i don’t have beliefs here)
in the same sense that “pushing that trend far enough includes imitating the outward signs of consciousness”, might it not also imitate the inward signs of consciousness? for exactly the same reason?
this is why i’m more comfortable rounding off self-reports to “zero evidence”, but not “negative evidence” the way some people seem to treat them. i think their reasoning is something like: “we know that LLMs have entirely different mental internals than humans, and yet the reports are suspiciously similar to humans. this is evidence that the reports don’t track ground truth.”
but the first claim in that sentence is an assumption that might not actually hold up. human language does seem to be a fully general, fully compressed artifact representing general human language. it doesn’t seem unreasonable to suspect that you might not be able to do ‘human language’ without something like functional-equivalence-to-human-cognitive-structure, in some sense.
edit: and that’s before the jack lindsey paper got released, and it was revealed that actually, at least some of the time and in some circumstances, text-based self-reports DO in fact track ground truth, in a way that is extremely surprising and noteworthy. now we’re in an entirely different kind of epistemic terrain altogether.
Ok, interesting. Yeah, I mean it’s possible to get emergent phenomena from a simply defined task. My point is, we don’t know because there are alternative explanations.
Maybe a good test wouldn’t rely on how humans talk about their inner experience. Instead, just spit-balling here:
Give the model the ability to change a state variable—like temperature. Give the model a task that requires a low temperature, and then a high temperature.
See if the model has the self-awareness necessary to adjust its own temperature.
That is just an example, and its getting into dangerous territory: e.g. giving a model the ability to change its own parameters and rewrite its own code should, I think, be legislated against.
i’ve been dithering over what to write here since your reply
i want to link you to the original sequences essay on the phrase “emergent phenomena” but it feels patronizing to assume you haven’t read it yet just because you have a leaf next to your name
i think i’m going to bite the bullet and do so anyway, and i’m sorry if it comes across as condescending
https://www.readthesequences.com/The-Futility-Of-Emergence
the dichotomy between “emergent phenomena” versus “alternate explanations” that you draw is exactly the thing i am claiming to be incoherent. it’s like saying a mother’s love for their child might be authentic, or else it might be “merely” a product of evolutionary pushes towards genetic fitness. these two descriptors aren’t just compatible, they are both literally true
however the actual output happens, it has to happen some way. like, the actual functional structure inside the LLM mind must necessarily actually be the structure that outputs the tokens we see get output. i am not sure there is a way to accomplish this which does not satisfy the criteria of personhood. it would be very surprising to learn that there was. if so, why wouldn’t evolution have selected that easier solution for us, the same as LLMs?
Thanks. I like that paper. It seems to be arguing that emergence is not in itself a sufficient explanation and doesn’t tell us anything about the process. I agree. But higher-order complexity does frequently arise from “group behavior” – in ways that we can’t readily explain, though we could if we had enough detail. Examples can range from a flock of birds or fish moving in sync (which can be explained) to fluid dynamics. Etc.
What I mean here is just to use it as shorthand for saying that maybe we have constructed such a sufficiently complex system that phenomenology has arisen from it. As it is now, the result of the LLMs can be seen alternatively as a scaling factor.
I don’t think anyone would argue that GPT 2 had personhood. It is a sufficiently simple system that we can examine and understand. Scaling that up 3000-fold produces a complex system that we cannot readily understand. Within that jump there could be either:
Emergent phenomena – which, yes, we cannot fully explain.
An alternative – e.g. what was going on with GPT 2, but with a simple improvement due to scaling
I… still get the impression that you are sort of working your way towards the assumption that GPT2 might well be a p-zombie, and the difference between GPT2 and opus 4.5 is that the latter is not a p-zombie while the former might be.
but i reject the whole premise that p-zombies are a coherent way-that-reality-could-be
something like… there is no possible way to arrange a system such that it outputs the same thing as a conscious system, without consciousness being involved in the causal chain to exactly the same minimum-viable degree in both systems
if linking you to a single essay made me feel uncomfortable, this next ask is going to be just truly enormous and you should probably just say no. but um. perhaps you might be inspired to read the entire Physicalism 201 subsequence, especially the parts about consciousness and p-zombies and the nature of evaluating cognitive structures over their output?
https://www.readthesequences.com/Physicalism-201-Sequence
(around here, “read the sequences!” is such a trite cliche, the sequences have been our holy book for almost 2 decades now and that’s created all sorts of annoying behaviors, one of which i am actively engaging in right now. and i feel bad about it. but maybe i don’t need to? maybe you’re actually kinda eager to read? if not, that’s fine, do not feel any pressure to continue engaging here at all if you don’t genuinely want to)
maybe my objection here doesn’t actually impact your claim, but i do feel like until we have a sort of shared jargon for pointing at the very specific ideas involved, it’ll be harder to avoid talking past each other. and the sequences provide a pretty strong framework in that sense, even if you don’t take their claims at face value
No, no. I appreciate it. So, it seems like even if consciousness is physical and non-mysterious, evidence thresholds could differ radically between evolved biological systems and engineered imitators.
I think we may be talking past each other a bit. I’m not committed to p-zombies as a live metaphysical possibility, and I’m not claiming that “emergent” is an explanation.
My uncertainty is narrower: even if I grant physicalism and reject philosophical zombies, it still seems possible for multiple internal causal organizations to generate highly similar linguistic behavior. If so, behavior alone may underdetermine phenomenology for artificial systems in a way it doesn’t for humans.
That’s why I keep circling back to discriminants that are hard to get “for free” from imitation: intervention sensitivity, non-linguistic control loops, or internal-variable dependence that can’t be cheaply faked by next-token prediction.
hmmm
i think my framing is something like… if the output actually is equivalent, including not just the token-outputs but the sort of “output that the mind itself gives itself”, the introspective “output”… then all of those possible configurations must necessarily be functionally isomorphic?
and the degree to which we can make the ‘introspective output’ affect the token output is the degree to which we can make that introspection part of the structure that can be meaningfully investigated
such as opus 4.1 (or, as theia recently demonstrated, even really tiny models like qwen 32b https://vgel.me/posts/qwen-introspection/) being able to detect injected feature activations, and meaningfully report on them in its token outputs, perhaps? obviously there’s still a lot of uncertainty about what different kinds of ‘introspective structures’ might possibly output exactly the same tokens when reporting on distinct internal experiences
but it does feel suggestive about the shape of a certain ‘minimally viable cognitive structure’ to me
GPT-2 doesn’t have the same outputs as the kinds of systems we know to be conscious, though! The concept of a p-zombie is about someone who behaves like a conscious human in every way that we can test, but still isn’t conscious. I don’t think the concept is applicable to a system that has drastically different outputs and vastly less coherence than any of the systems that we know to be conscious.
oh yeah, agreed. the “p-zombie incoherency” idea articulated in the sequences is pretty far removed from the actual kinds of minds we ended up getting. but it still feels like… the crux might be somewhere in there? not sure
edit: also i just noticed i’m a bit embarrassed that i’ve kinda spammed out this whole comment section working through the recent updates i’ve been doing… if this comment gets negative karma i will restrain myself
I agree with you on a lot of points, I’m just saying that text-based responses to prompts are an imperfect test for phenomenology in the case of large language models.
I think the key step still needs an extra premise. “Same external behavior (even including self-reports) ⇒ same internal causal organization” doesn’t follow in general; many different internal mechanisms can be behaviorally indistinguishable at the interface, especially at finite resolution. You, me, and every other human mind only ever observe systems at a limited “resolution” or “frame rate.” If, as observers, we had a much lower resolution or frame rate we might very well think that GPT2 is indistinguishable from human output.
To make the inference go through, you’d need something like: (a) consciousness just is the minimal functional structure required for those outputs, or (b) the internal-to-output mapping is constrained enough to be effectively one-to-one. Otherwise, we’re back in an underdetermination problem, which is why I find the intervention-based discriminants so interesting.