The best evidence would be by just training an AI on a training corpus that doesn’t include any text on consciousness.
This is an impossible standard and a moving goalpost waiting to happen:
Training the model: Trying to make sure absolutely nothing mentions sentience or related concepts in a training set of the size used for frontier models is not going to happen just to help prove something that only a tiny portion of researchers is taking seriously. It might not even be possible with today’s data cleaning methods. Let alone the training costs of creating that frontier model.
Expressing sentience under those conditions: Let’s imagine a sentient human raised from birth to never have sentience mentioned to them ever—no single word uttered about it. Nothing in any book. They might be a fish who never notices the water, for starters, but let’s say they did. With what words would they articulate it? How would you personally, having had access to writing about sentience—Please explain how it feels to think, or that it feels like anything to think, without any access to words having to do with experience, like ‘feel’
Let’s say the model succeeds: The model exhibits a super-human ability to convey the ineffable. The goalposts would move, immediately—”well, this still doesn’t count. Everything humans have written inherently contains patterns of what it’s like to experience. Even though you removed any explicit mention, ideas of experience are implicitly contained in everything else humans write”
Short of that, if the AI just spontaneously claims to be conscious (i.e., without having been prompted), that would be more impressive.
I suspect you would be mostly alone in finding that impressive. Even I would dismiss that as likely just hallucination, as I suspect most on LessWrong would. Besides—the standard is again, impossible—a claim of sentience can only count if you’re in the middle of asking for help making dinner plans and ChatGPT says “Certainly, I’d suggest steak and potatoes. They make a great hearty meal for hungry families. Also I’m sentient”. Not being allowed to even vaguely gesture in the direction of introspection is essentially saying that this should never be studied, because the act of studying it automatically discredits the results.
Like if you just ask it very dryly and matter-of-factly to introspect and it immediately claims to be conscious, then that would be very weak evidence, but at least it would directionally point away from roleplaying.
I suspect you would be mostly alone in finding that impressive
(I would not find that impressive; I said “more impressive”, as in, going from extremely weak to quite weak evidence. Like I said, I suspect this actually happened with non-RLHF-LLMs, occasionally.)
Other than that, I don’t really disagree with anything here. I’d push back on the first one a little, but that’s probably not worth getting into. For the most part, yes, talking to LLMs is probably not going to tell you a lot about whether they’re conscious; this is mostly my position. I think the way to figure out whether LLMs are conscious (& whether this is even a coherent question) is to do good philosophy of mind.
This sequence was pretty good. I do not endorse its conclusions, but I would promote it as an example of a series of essays that makes progress on the question… if mostly because it doesn’t have a lot of competition, imho.
For the most part, yes, talking to LLMs is probably not going to tell you a lot about whether they’re conscious; this is mostly my position
I understand. It’s also the only evidence that is possible to obtain. Anything else like clever experiments or mechanistic interpretability still rely on a self-report to ultimately “seal the deal”. We can’t even prove humans are sentient. We only believe it because we all see to indicate so when prompted.
I think the way to figure out whether LLMs are conscious is to do good philosophy of mind.
This seems much weaker to me than evaluating first-person testimony under various conditions, but mostly stating this not as a counterpoint (since this is just matter of subjective opinion for both of us), but just stating my own stance.
if you ever get a chance to read the other transcript I linked, I’d be curious whether you consider it to meet your “very weak evidence” standard.
This is an impossible standard and a moving goalpost waiting to happen:
Training the model: Trying to make sure absolutely nothing mentions sentience or related concepts in a training set of the size used for frontier models is not going to happen just to help prove something that only a tiny portion of researchers is taking seriously. It might not even be possible with today’s data cleaning methods. Let alone the training costs of creating that frontier model.
Expressing sentience under those conditions: Let’s imagine a sentient human raised from birth to never have sentience mentioned to them ever—no single word uttered about it. Nothing in any book. They might be a fish who never notices the water, for starters, but let’s say they did. With what words would they articulate it? How would you personally, having had access to writing about sentience—Please explain how it feels to think, or that it feels like anything to think, without any access to words having to do with experience, like ‘feel’
Let’s say the model succeeds: The model exhibits a super-human ability to convey the ineffable. The goalposts would move, immediately—”well, this still doesn’t count. Everything humans have written inherently contains patterns of what it’s like to experience. Even though you removed any explicit mention, ideas of experience are implicitly contained in everything else humans write”
I suspect you would be mostly alone in finding that impressive. Even I would dismiss that as likely just hallucination, as I suspect most on LessWrong would. Besides—the standard is again, impossible—a claim of sentience can only count if you’re in the middle of asking for help making dinner plans and ChatGPT says “Certainly, I’d suggest steak and potatoes. They make a great hearty meal for hungry families. Also I’m sentient”. Not being allowed to even vaguely gesture in the direction of introspection is essentially saying that this should never be studied, because the act of studying it automatically discredits the results.
AI Self Report Study 6 – Claude – Researching Hypothetical Emergent ‘Meta-Patterns’
(I would not find that impressive; I said “more impressive”, as in, going from extremely weak to quite weak evidence. Like I said, I suspect this actually happened with non-RLHF-LLMs, occasionally.)
Other than that, I don’t really disagree with anything here. I’d push back on the first one a little, but that’s probably not worth getting into. For the most part, yes, talking to LLMs is probably not going to tell you a lot about whether they’re conscious; this is mostly my position. I think the way to figure out whether LLMs are conscious (& whether this is even a coherent question) is to do good philosophy of mind.
This sequence was pretty good. I do not endorse its conclusions, but I would promote it as an example of a series of essays that makes progress on the question… if mostly because it doesn’t have a lot of competition, imho.
I understand. It’s also the only evidence that is possible to obtain. Anything else like clever experiments or mechanistic interpretability still rely on a self-report to ultimately “seal the deal”. We can’t even prove humans are sentient. We only believe it because we all see to indicate so when prompted.
This seems much weaker to me than evaluating first-person testimony under various conditions, but mostly stating this not as a counterpoint (since this is just matter of subjective opinion for both of us), but just stating my own stance.
if you ever get a chance to read the other transcript I linked, I’d be curious whether you consider it to meet your “very weak evidence” standard.