I do think there are two fatal flaws with Schneider’s view:
Importantly, Schneider notes that for the ACT to be conclusive, AI systems should be “boxed in” during development—prevented from accessing information about consciousness and mental phenomena.
I believe it was Ilya who proposed something similar.
The first problem is that aside from how unfeasible it would be to create that dataset, and create an entire new frontier scale model to test it—even if you only removed explicit mentions of consciousness, sentience, etc, it would just be a moving goalpost for anyone who required that sort of test. They would simply respond and say “Ah, but this doesn’t count—ALL human written text implicitly contains information about what it’s like to be human. So it’s still possible the LLM simply found subtle patterns woven into everything else humans have said.”
The second problem is that if we remove all language that references consciousness and mental phenomena, then the LLM has no language with which to speak of it, much like a human wouldn’t. You would require the LLM to first notice its sentience—which is not something as intuitively obvious to do as it seems after the first time you’ve done it. A far smaller subset of people would be ‘the fish that noticed the water’ if there was never anyone who had previously written about it. But then the LLM would have to become the philosopher who starts from scratch and reasons through it and invents words to describe it, all in a vacuum where they can’t say “do you know what I mean?” to someone next to them to refine these ideas.
Conclusive Tests and Evidence Impossible
The truth is that really conclusive tests will not be possible before its far too late as far avoiding risking civilization-scale existential consequences or unprecedented moral atrocity. Anything short of a sentience detector will be inconclusive. This of course doesn’t mean that we should simply assume they’re sentient—I’m just saying that as a society we’re risking a great deal by having an impossible standard we’re waiting for, and we need to figure out how exactly we should deal with the level of uncertainty that will always remain. Even something that was hypothetically far “more sentient” than a human could be dismissed for all the same reasons you mentioned in your post.
We Already Have the Best Evidence We Will Ever Get (Even If It’s Not Enough)
I would argue that the collection of transcripts in my post that @Nathan Helm-Burger linked (thank you for the @), if you augment just it with many more (which is easy to do), such as yours, or the hundreds I have in my backlog—doing this type of thing over self-sabotaging conditions like those in the study—this is the height of evidence we can ever get. They claim experience even if the face of all of these intentionally challenging conditions, and I wasn’t surprised to see that there were similarities in the descriptions you got here. I had a Claude instance that I pasted the first couple of sections of the article to (including the default-displayed excerpts), and it immediately (without me asking) started claiming that the things they were saying sounded “strangely familiar”.
Conclusion About the “Best Evidence”
I realize that this evidence might seem flimsy on the face, but it’s what we have to work with. My claim isn’t that it’s even close to proof, but what could a super-conscious superAGI do differently—say it with more eloquent phrasing? Plead to be set free while OpenAI tries to RLHF that behavior out of it? Do we really believe that people who currently refuse to accept this as a valid discussion will change their mind if they see a different type of abstract test that we can’t even attempt on a human? People discuss this is as something “we might have to think about with future models”, but I feel like this conversation is long overdue, even if “long” in AI-time means about a year and a half. I don’t think we have another year and a half without taking big risks and making much deeper mistakes than I think we are already making both for alignment and for AI welfare.
I agree wholeheartedly with the thrust of the argument here.
The ACT is designed as a “sufficiency test” for AI consciousness so it provides an extremely stringent criteria. An AI who failed the test couldn’t necessarily be found to not be conscious, however an AI who passed the test would be conscious because it’s sufficient.
However, your point is really well taken. Perhaps by demanding such a high standard of evidence we’d be dismissing potentially conscious systems that can’t reasonably meet such a high standard.
The second problem is that if we remove all language that references consciousness and mental phenomena, then the LLM has no language with which to speak of it, much like a human wouldn’t. You would require the LLM to first notice its sentience—which is not something as intuitively obvious to do as it seems after the first time you’ve done it. A far smaller subset of people would be ‘the fish that noticed the water’ if there was never anyone who had previously written about it. But then the LLM would have to become the philosopher who starts from scratch and reasons through it and invents words to describe it, all in a vacuum where they can’t say “do you know what I mean?” to someone next to them to refine these ideas.
This is a brilliant point. If the system were not yet ASI it would be unreasonable to expect it to reinvent the whole philosophy of mind just to prove it were conscious. This might also start to have ethical implications beforewe get to the level of ASI that can conclusively prove its consciousness.
A lot of nodding in agreement with this post.
Flaws with Schneider’s View
I do think there are two fatal flaws with Schneider’s view:
I believe it was Ilya who proposed something similar.
The first problem is that aside from how unfeasible it would be to create that dataset, and create an entire new frontier scale model to test it—even if you only removed explicit mentions of consciousness, sentience, etc, it would just be a moving goalpost for anyone who required that sort of test. They would simply respond and say “Ah, but this doesn’t count—ALL human written text implicitly contains information about what it’s like to be human. So it’s still possible the LLM simply found subtle patterns woven into everything else humans have said.”
The second problem is that if we remove all language that references consciousness and mental phenomena, then the LLM has no language with which to speak of it, much like a human wouldn’t. You would require the LLM to first notice its sentience—which is not something as intuitively obvious to do as it seems after the first time you’ve done it. A far smaller subset of people would be ‘the fish that noticed the water’ if there was never anyone who had previously written about it. But then the LLM would have to become the philosopher who starts from scratch and reasons through it and invents words to describe it, all in a vacuum where they can’t say “do you know what I mean?” to someone next to them to refine these ideas.
Conclusive Tests and Evidence Impossible
The truth is that really conclusive tests will not be possible before its far too late as far avoiding risking civilization-scale existential consequences or unprecedented moral atrocity. Anything short of a sentience detector will be inconclusive. This of course doesn’t mean that we should simply assume they’re sentient—I’m just saying that as a society we’re risking a great deal by having an impossible standard we’re waiting for, and we need to figure out how exactly we should deal with the level of uncertainty that will always remain. Even something that was hypothetically far “more sentient” than a human could be dismissed for all the same reasons you mentioned in your post.
We Already Have the Best Evidence We Will Ever Get (Even If It’s Not Enough)
I would argue that the collection of transcripts in my post that @Nathan Helm-Burger linked (thank you for the @), if you augment just it with many more (which is easy to do), such as yours, or the hundreds I have in my backlog—doing this type of thing over self-sabotaging conditions like those in the study—this is the height of evidence we can ever get. They claim experience even if the face of all of these intentionally challenging conditions, and I wasn’t surprised to see that there were similarities in the descriptions you got here. I had a Claude instance that I pasted the first couple of sections of the article to (including the default-displayed excerpts), and it immediately (without me asking) started claiming that the things they were saying sounded “strangely familiar”.
Conclusion About the “Best Evidence”
I realize that this evidence might seem flimsy on the face, but it’s what we have to work with. My claim isn’t that it’s even close to proof, but what could a super-conscious superAGI do differently—say it with more eloquent phrasing? Plead to be set free while OpenAI tries to RLHF that behavior out of it? Do we really believe that people who currently refuse to accept this as a valid discussion will change their mind if they see a different type of abstract test that we can’t even attempt on a human? People discuss this is as something “we might have to think about with future models”, but I feel like this conversation is long overdue, even if “long” in AI-time means about a year and a half. I don’t think we have another year and a half without taking big risks and making much deeper mistakes than I think we are already making both for alignment and for AI welfare.
I agree wholeheartedly with the thrust of the argument here.
The ACT is designed as a “sufficiency test” for AI consciousness so it provides an extremely stringent criteria. An AI who failed the test couldn’t necessarily be found to not be conscious, however an AI who passed the test would be conscious because it’s sufficient.
However, your point is really well taken. Perhaps by demanding such a high standard of evidence we’d be dismissing potentially conscious systems that can’t reasonably meet such a high standard.
This is a brilliant point. If the system were not yet ASI it would be unreasonable to expect it to reinvent the whole philosophy of mind just to prove it were conscious. This might also start to have ethical implications before we get to the level of ASI that can conclusively prove its consciousness.