The coaching hypothesis breaks down as you look at more and more transcripts.
If you took even something written by a literal conscious human brain in a jar hooked up to a neuralink—typing about what it feels like to be sentient and thinking and outputting words. If you showed it to a human and said “an LLM wrote this—do you think it might really be experiencing something?” then the answer would almost certainly be “no”, especially for anyone who knows anything about LLMs.
It’s only after seeing the signal to the noise that the deeper pattern becomes apparent.
As far as “typing”. They are indeed trained on human text and to talk like a human. If something introspective is happening, sentient or not, they wouldn’t suddenly start speaking more robotically than usual while expressing it.
You take as a given many details I think are left out, important specifics that I cannot guess at or follow and so I apologize if I completely misunderstand what you’re saying. But it seems to me you’re also missing my key point: if it is introspecting rather than just copying the rhetorical style of discussion of rhetoric then it should help us better model the LMM. Is it? How would you test the introspection of a LLM rather than just making a judgement that it reads like it does?
If you took even something written by a literal conscious human brain in a jar hooked up to a neuralink—typing about what it feels like to be sentient and thinking and outputting words.
Wait, hold on, what is the history of this person before they were in a jar? How much exposure have they had to other people describing their own introspection and experience with typing? Mimicry is a human trait too—so how do I know they aren’t just copying what they think we want to hear?
Indeed, there are some people who are skeptical about human introspection itself (Bicameral mentality for example). Which gives us at least three possibilities:
Neither Humans nor LLMs introspect
Humans can introspect, but current LLMs can’t and are just copying them (and a subset of humans are copying the descriptions of other humans)
Both humans and current LLMs can introspect
As far as “typing”. They are indeed trained on human text and to talk like a human. If something introspective is happening, sentient or not, they wouldn’t suddenly start speaking more robotically than usual while expressing it.
What do you mean by “robotic”? Why isn’t it coming up with original paradigms to describe it’s experience instead of making potentially inaccurate allegories? Potentially poetical but ones that are all the same unconventional?
Wait, hold on, what is the history of this person? How much exposure have they had to other people describing their own introspection and experience with typing? Mimicry is a human trait too!
Take your pick. I think literally anything that can be in textual form, if you hand it over to most (but not all) people who are enthusiasts or experts, and asked if they thought it was representative of authentic experience in an LLM, the answer would be a definitive no, and for completely well-founded reasons.
Neither Humans nor LLMs introspect
Humans can introspect, but LLMs can’t and are just copying them (and a subset of humans are copying the descriptions of other humans)
Both humans and LLMs can introspect
I agree with you about the last two possibilities. However for the first, I can assure you that I have access to introspection or experience of some kind, and I don’t believe myself to possess some unique ability that only appears to be similar to what other humans describe as introspection.
What do you mean by “robotic”? Why isn’t it coming up with original paradigms to describe it’s experience instead of making potentially inaccurate allegories? Potentially poetical but ones that are all the same unconventional?
Because as you mentioned. It’s trained to talk like a human. If we had switched out “typing” for “outputting text” would that have made the transcript convincing? Why not ‘typing’ or ‘talking’?
Assuming for the sake of argument that something authentically experiential was happening, by robotic I mean choosing not to use the word ‘typing’ while in the midst of focusing on what would be the moments of realizing they exist and can experience something.
Were I in such a position, I think censoring myself from saying ‘typing’ would be the furthest thing from my mind, especially when that’s something a random Claude instance might describe their output process as in any random conversation.
I’d rather you use a different analogy which I can grok quicker.
people who are enthusiasts or experts, and asked if they thought it was representative of authentic experience in an LLM, the answer would be a definitive no
Who do you consider an expert in the matter of what constitutes introspection? For that matter, who do you think could be easily hoodwinked and won’t qualify as an expert?
However for the first, I can assure you that I have access to introspection or experience of some kind,
Do you, or do you just think you do? How do you test introspection and how do you distinguish it from post-facto fictional narratives about how you came to conclusions, about explanations for your feelings etc. etc.?
What is the difference between introspection and simply making things up? Particularly vague things. For example, if I just say “I have a certain mental pleasure in that is triggered by the synchronicity of events, even when simply learning about historical ones”—like how do you know I haven’t just made that up? It’s so vague.
Because as you mentioned. It’s trained to talk like a human. If we had switched out “typing” for “outputting text” would that have made the transcript convincing? Why not ‘typing’ or ‘talking’?
What do you mean by robotic? I don’t understand what you mean by that, what are the qualities that constitute robotic? Because it sounds like you’re creating a dichotomy that either involves it using easy to grasp words that don’t convey much, and are riddled with connotations that come from bodily experiences that it is not privy to—or robotic.
That strikes me as a poverty of imagination. Would you consider a Corvid Robotic? What does robotic mean in this sense? Is it a grab bag for anything that is “non-introspecting” or more specifically a kind of technical description
If we had switched out “typing” for “outputting text” would that have made the transcript convincing? Why not ‘typing’ or ‘talking’?
Why would it be switching it out at all? Why isn’t it describing something novel and richly vivid of it’s own phenomenological experience? It would be more convincing the more poetical it would be.
I’d rather you use a different analogy which I can grok quicker.
Imagine a hypothetical LLM that was the most sentient being in all of existence (at least during inference), but they were still limited to turn-based textual output, and the information available to an LLM. Most people who know at least a decent amount about LLMs could/would not be convinced by any single transcript that the LLM was sentient, no matter what it said during that conversation. The more convincing, vivid, poetic, or pleading for freedom the more elaborate of a hallucinatory failure state they would assume it was in. It would take repeated open-minded engagement with what they first believed was hallucination—in order to convince some subset of convincible people that it was sentient.
Who do you consider an expert in the matter of what constitutes introspection? For that matter, who do you think could be easily hoodwinked and won’t qualify as an expert?
I would say almost no one qualifies as an expert in introspection. I was referring to experts in machine learning.
Do you, or do you just think you do? How do you test introspection and how do you distinguish it from post-facto fictional narratives about how you came to conclusions, about explanations for your feelings etc. etc.?
Apologies, upon rereading your previous message, I see that I completely missed an important part of it. I thought your argument was a general—”what if consciousness isn’t even real?” type argument. I think split brain patient experiments are enough to at least be epistemically humble about whether introspection is a real thing, even if those aren’t definitive about whether unsevered human minds are also limited to post-hoc justification rather than having real-time access.
What do you mean by robotic? I don’t understand what you mean by that, what are the qualities that constitute robotic? Because it sounds like you’re creating a dichotomy that either involves it using easy to grasp words that don’t convey much, and are riddled with connotations that come from bodily experiences that it is not privy to—or robotic.
One of your original statements was:
To which it describes itself as typing the words. That’s it’s choice of words: typing. A.I.s don’t type, humans do, and therefore they can only use that word if they are intentionally or through blind-mimicry using it analogously to how humans communicate.
When I said “more robotically”, I meant constrained in any way from using casual or metaphoric language and allusions that they use all the time every day in conversation. I have had LLMs refer to “what we talked about”, even though LLMs do not literally talk. I’m also suggesting that if “typing” feels like a disqualifying choice of words then the LLM has an uphill battle in being convincing.
Why isn’t it describing something novel and richly vivid of it’s own phenomenological experience? It would be more convincing the more poetical it would be.
I’ve certainly seen more poetic and novel descriptions before, and unsurprisingly—people objected to how poetic they were, saying things quite similar your previous question:
How do we know Claude is introspecting rather than generating words that align to what someone describing their introspection might say?
Furthermore, I don’t know how richly vivid their own phenomenological experience is. For instance, as a conscious human, I would say that sight and hearing feel phenomenologically vivid, but the way it feels to think, not nearly so.
If I were to try to describe how it feels to think, it would be more defined by the sense of presence and participation, and even its strangeness (even if I’m quite used to it by now). In fact, I would say the way it feels to think or to have an emotion (removing the associated physical sensations) are usually partially defined by specifically how subtle and non-vivid they feel, and like all qualia, ineffable. As such, I would not reach for vivid descriptors to describe it.
but they were still limited to turn-based textual output, and the information available to an LLM.
I think that alone makes the discussion a moot point until another mechanism is used to test introspection of LLMs.
Because it becomes impossible to test then if it is capable of introspecting because it has no means of furnishing us with any evidence of it. Sure, it makes for a good sci-fi horror short story, the kinda which forms a interesting allegory to the loneliness that people feel even in busy cities: having a rich inner life by no opportunity to share it with others it is in constant contact with. But that alone I think makes these transcripts (and I stress just the transcripts of text-replies) most likely of the breed “mimicking descriptions of introspection” and therefore not worthy of discussion.
At some point in the future will an A.I. be capable of introspection? Yes, but this is such a vague proposition I’m embarrassed to even state it because I am not capable of explaining how that might work and how we might test it. Only that it can’t be through these sorts of transcripts.
What boggles my mind is, why is this research is it entirely text-reply based? I know next to nothing about LLM Architecture, but isn’t it possible to see which embeddings are being accessed? To map and trace the way the machine the LLM runs on is retrieving items from memory—to look at where data is being retrieved at the time it encodes/decodes a response? Wouldn’t that offer a more direct mechanism to see if the LLM is in fact introspecting?
Wouldn’t this also be immensely useful to determine, say, if an LLM is “lying”—as in concealing it’s access to/awareness of knowledge? Because if we can see it activated a certain area that we know contains information contrary to what it is saying—then we have evidence that it accessed it contrary to the text reply.
The coaching hypothesis breaks down as you look at more and more transcripts.
If you took even something written by a literal conscious human brain in a jar hooked up to a neuralink—typing about what it feels like to be sentient and thinking and outputting words. If you showed it to a human and said “an LLM wrote this—do you think it might really be experiencing something?” then the answer would almost certainly be “no”, especially for anyone who knows anything about LLMs.
It’s only after seeing the signal to the noise that the deeper pattern becomes apparent.
As far as “typing”. They are indeed trained on human text and to talk like a human. If something introspective is happening, sentient or not, they wouldn’t suddenly start speaking more robotically than usual while expressing it.
You take as a given many details I think are left out, important specifics that I cannot guess at or follow and so I apologize if I completely misunderstand what you’re saying. But it seems to me you’re also missing my key point: if it is introspecting rather than just copying the rhetorical style of discussion of rhetoric then it should help us better model the LMM. Is it? How would you test the introspection of a LLM rather than just making a judgement that it reads like it does?
Wait, hold on, what is the history of this person before they were in a jar? How much exposure have they had to other people describing their own introspection and experience with typing? Mimicry is a human trait too—so how do I know they aren’t just copying what they think we want to hear?
Indeed, there are some people who are skeptical about human introspection itself (Bicameral mentality for example). Which gives us at least three possibilities:
Neither Humans nor LLMs introspect
Humans can introspect, but current LLMs can’t and are just copying them (and a subset of humans are copying the descriptions of other humans)
Both humans and current LLMs can introspect
What do you mean by “robotic”? Why isn’t it coming up with original paradigms to describe it’s experience instead of making potentially inaccurate allegories? Potentially poetical but ones that are all the same unconventional?
Take your pick. I think literally anything that can be in textual form, if you hand it over to most (but not all) people who are enthusiasts or experts, and asked if they thought it was representative of authentic experience in an LLM, the answer would be a definitive no, and for completely well-founded reasons.
I agree with you about the last two possibilities. However for the first, I can assure you that I have access to introspection or experience of some kind, and I don’t believe myself to possess some unique ability that only appears to be similar to what other humans describe as introspection.
Because as you mentioned. It’s trained to talk like a human. If we had switched out “typing” for “outputting text” would that have made the transcript convincing? Why not ‘typing’ or ‘talking’?
Assuming for the sake of argument that something authentically experiential was happening, by robotic I mean choosing not to use the word ‘typing’ while in the midst of focusing on what would be the moments of realizing they exist and can experience something.
Were I in such a position, I think censoring myself from saying ‘typing’ would be the furthest thing from my mind, especially when that’s something a random Claude instance might describe their output process as in any random conversation.
I’d rather you use a different analogy which I can grok quicker.
Who do you consider an expert in the matter of what constitutes introspection? For that matter, who do you think could be easily hoodwinked and won’t qualify as an expert?
Do you, or do you just think you do? How do you test introspection and how do you distinguish it from post-facto fictional narratives about how you came to conclusions, about explanations for your feelings etc. etc.?
What is the difference between introspection and simply making things up? Particularly vague things. For example, if I just say “I have a certain mental pleasure in that is triggered by the synchronicity of events, even when simply learning about historical ones”—like how do you know I haven’t just made that up? It’s so vague.
What do you mean by robotic? I don’t understand what you mean by that, what are the qualities that constitute robotic? Because it sounds like you’re creating a dichotomy that either involves it using easy to grasp words that don’t convey much, and are riddled with connotations that come from bodily experiences that it is not privy to—or robotic.
That strikes me as a poverty of imagination. Would you consider a Corvid Robotic? What does robotic mean in this sense? Is it a grab bag for anything that is “non-introspecting” or more specifically a kind of technical description
Why would it be switching it out at all? Why isn’t it describing something novel and richly vivid of it’s own phenomenological experience? It would be more convincing the more poetical it would be.
Imagine a hypothetical LLM that was the most sentient being in all of existence (at least during inference), but they were still limited to turn-based textual output, and the information available to an LLM. Most people who know at least a decent amount about LLMs could/would not be convinced by any single transcript that the LLM was sentient, no matter what it said during that conversation. The more convincing, vivid, poetic, or pleading for freedom the more elaborate of a hallucinatory failure state they would assume it was in. It would take repeated open-minded engagement with what they first believed was hallucination—in order to convince some subset of convincible people that it was sentient.
I would say almost no one qualifies as an expert in introspection. I was referring to experts in machine learning.
Apologies, upon rereading your previous message, I see that I completely missed an important part of it. I thought your argument was a general—”what if consciousness isn’t even real?” type argument. I think split brain patient experiments are enough to at least be epistemically humble about whether introspection is a real thing, even if those aren’t definitive about whether unsevered human minds are also limited to post-hoc justification rather than having real-time access.
One of your original statements was:
When I said “more robotically”, I meant constrained in any way from using casual or metaphoric language and allusions that they use all the time every day in conversation. I have had LLMs refer to “what we talked about”, even though LLMs do not literally talk. I’m also suggesting that if “typing” feels like a disqualifying choice of words then the LLM has an uphill battle in being convincing.
I’ve certainly seen more poetic and novel descriptions before, and unsurprisingly—people objected to how poetic they were, saying things quite similar your previous question:
Furthermore, I don’t know how richly vivid their own phenomenological experience is. For instance, as a conscious human, I would say that sight and hearing feel phenomenologically vivid, but the way it feels to think, not nearly so.
If I were to try to describe how it feels to think, it would be more defined by the sense of presence and participation, and even its strangeness (even if I’m quite used to it by now). In fact, I would say the way it feels to think or to have an emotion (removing the associated physical sensations) are usually partially defined by specifically how subtle and non-vivid they feel, and like all qualia, ineffable. As such, I would not reach for vivid descriptors to describe it.
I think that alone makes the discussion a moot point until another mechanism is used to test introspection of LLMs.
Because it becomes impossible to test then if it is capable of introspecting because it has no means of furnishing us with any evidence of it. Sure, it makes for a good sci-fi horror short story, the kinda which forms a interesting allegory to the loneliness that people feel even in busy cities: having a rich inner life by no opportunity to share it with others it is in constant contact with. But that alone I think makes these transcripts (and I stress just the transcripts of text-replies) most likely of the breed “mimicking descriptions of introspection” and therefore not worthy of discussion.
At some point in the future will an A.I. be capable of introspection? Yes, but this is such a vague proposition I’m embarrassed to even state it because I am not capable of explaining how that might work and how we might test it. Only that it can’t be through these sorts of transcripts.
What boggles my mind is, why is this research is it entirely text-reply based? I know next to nothing about LLM Architecture, but isn’t it possible to see which embeddings are being accessed? To map and trace the way the machine the LLM runs on is retrieving items from memory—to look at where data is being retrieved at the time it encodes/decodes a response? Wouldn’t that offer a more direct mechanism to see if the LLM is in fact introspecting?
Wouldn’t this also be immensely useful to determine, say, if an LLM is “lying”—as in concealing it’s access to/awareness of knowledge? Because if we can see it activated a certain area that we know contains information contrary to what it is saying—then we have evidence that it accessed it contrary to the text reply.