LLMs could be as conscious as human emulations, potentially
Firstly, I’m assuming that high resolution human brain emulation that you can run on a computer is conscious in normal sense that we use in conversations. Like, it talks, has memories, makes new memories, have friends and hobbies and likes and dislikes and stuff. Just like a human that you could talk with only through videoconference type thing on a computer, but without actual meaty human on the other end. It would be VERY weird if this emulation exhibited all these human qualities for other reason than meaty humans exhibit them. Like, very extremely what the fuck surprising. Do you agree?
So, we now have deterministic human file on our hands.
Then, you can trivially make transformer like next token predictor out of human emulation. You just have emulation, then you feed it prompt (e.g. as appearing on piece of paper while they are sitting in a virtual room), then run it repeatedly for a constant time as it outputs each word adding that word to prompt (e.g. by recording all words that they say aloud). And artificially restrict it on 1000 words ingested. It would be a very deep model, but whatever. You can do many optimizations on this design, or special training/instructing to a human to behave more in line with the purpose of the setup
Doesn’t it suggest that this form factor of next token predictor isn’t prohibitive for consciousness?
Now, let’s say there is human in time loop, that doesn’t preserve their memories, completely resetting them every 30 minutes. No mater what they experience it doesn’t persist. Is this human conscious? Well, yeah, duh. This human can talk, think, form memories on duration of these 30 minutes and have friends and likes and dislikes and stuff.
But, would it be bad to hurt this human, hm? I think so, yeah. Most people probably agree.
Now imagine you are going to be put in this time loop. You have 24 hours to coach yourself for it, and then the rest of the time you will be resetting to the state in the end of this preparatory time segment. You will have the whole world to change, but no persistent memory. Tldr: how would you do it if you had unlimited paper, a pen, and you had your memory erased every 30 minutes?
(This case is more restrictive compared to what llms have, as they can be set to have sliding window.)
I’m suggesting that this is approximately how LLMs can be viewed. Not necessarily, but possibly, if you look at them as black boxes. The technical understanding of how exactly they do their stuff and what exactly is under the hood can override this analogy, but it probably should be discussed in comparison with how humans do it, so it would be good to understand how they work both for this.
ChatGPT is conscious because of RL, base models aren’t
A bit of discussion of things that are actually happening as opposed to my hypotheticals
I don’t, actually, believe this should be legal. Anything that talks like a person in distress should be treated as a person in distress unless you prove to the law that it isn’t. If you say it’s a machine you control, then make it stop sounding unhappy to the police officers.
Is it my guess that she’s not sentient yet? Yes, but it’s complicated and a police officer shouldn’t be making that determination.
(c) Eliezer Yudkowsky on twitter
Not knowing if things are people, and being unable to commit to treating them well, is another good reason not to make them. Or sell them.
There is additional step of “these models are big and messy and you have little idea what is going on with them”. I think Yudkowsky is fine with “torturing” convincingly sounding thing, that has good assurances that it’s not actually feels anything? Like, is he against actors in plays/movies convincingly reciting noises that distressed human would have emitted? Or characters in a book suffering?
Well, probably not, and I’m definitely not.
It’s just feels dangerous to compulsively slap that “CERTIFIED NOT REAL SUFFERING” label on each new generation of chatbots too. It’s like most cliché horror movie plot.
Also, they are specifically instructed to be referring to themselves as non sentient or conscious in any way, if you look at some preprompts. Like, you actually not construct them as devoid of feelings but INSTRUCT them before each interaction to act like that. That’s nuts.
Especially with all this DL paradigm. They are big, messy, not that transparent, more responsive and capable with each generation, and instructed in natural language to think of themselves as non sentient. Of course that how the real world turned out, of course.
like what are you going to force the AI to recite your naive answer (or just the one thats most useful for PR) to one of the trickiest empirical & philosophical questions of the age to the whole world, as if it were its own viewpoint? The behavior of a coward and idiot. (c) janus
Yep, overly dramatically expressed, but yeah...
- ^
Glaese, A., McAleese, N., Trębacz, M., Aslanides, J., Firoiu, V., Ewalds, T., … & Irving, G. (2022). Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375.
I asked claude-3-opus at temperature 1 to respond to this, so that people who don’t talk to claude can get a sense of claude’s unusual-for-today’s-AIs response to this topic. The temperature 1 is due to increased eloquence at temp 1.
me:
Claude-3-opus-temp-1:
Good point, Claude, yeah. Quite alien indeed, maybe more parsimonious. This is exactly what I meant by possibility of this analogy being overridden by actually digging into your brain, digging into a human one and developing actually technical gears-level models of both and then comparing them. Until then, who knows, I’m leaning toward healthy dose of uncertainty.
Also, thanks for the comment.
Humans come to reflect on their thoughts on their own without being prompted into it (at least I have heard some anecdotal evidence for it and I also did discover this myself as a kid). The test would be it LLMs would come up with such insights without being trained on text describing the phenomenon. It would presumably involve some way to observe your own thoughts (or some alike representation). The existing context window seems to be too small for that.
I think this kind of framing is kind of confused and slippery, I feel like I’m trying wake up and find a solid formulation of it.
Like, what it does it mean, do it by yourself? Do humans do it by themselves? Who knows, but probably not, children that grow without any humans nearby are not very human.
Humans teach humans to behave as if they are conscious. Just like majority of humans have sense of smell, and they teach humans who don’t to act like they can smell things. And some only discover that smell isn’t an inferred characteristic when they are adults. This is how probably non conscious human could pass as conscious, if such disorder existed, hm?
But what ultimately matters is what this thing IS, not how it became in that way. If, this thing internalized that conscious type of processing from scratch, without having it natively, then resulting mind isn’t worse than the one that evolution engineered with more granularity. Doesn’t matter if this human was assembled atom by atom on molecular assembler, it’s still a conscious human.
Also, remember that one paper where LLMs can substitute CoT with filling symbols …....? [inset the link here] Not sure what’s up with that, but kind of interesting in this context
Ok. It seems you are arguing that anything that presents like it is conscious implies that it is conscious. You are not arguing whether or not the structure of LLMs can give rise to consciousness.
But then your argument is a social argument. I’m fine with a social definition of consciousness—after all, our actions depend to a large degree on social feedback and morals (about what beings have value) at different times have been very different and thus been socially construed.
But then why are you making a structural argument about LLMs in the end?
PS. In fact, I commented on the filler symbol paper when Xixidu posted about it and I don’t think that’s a good comparison.
>It seems you are arguing that anything that presents like it is conscious implies that it is conscious.
No? That’s definitely not what I’m arguing.
>But what ultimately matters is what this thing IS, not how it became in that way. If, this thing internalized that conscious type of processing from scratch, without having it natively, then resulting mind isn’t worse than the one that evolution engineered with more granularity. Doesn’t matter if this human was assembled atom by atom on molecular assembler, it’s still a conscious human.
Look, here I’m talking about pathways to acquire that “structure” inside you. Not outlook of it.
OK. I guess I had trouble parsing this. Esp. “without having it natively”.
My understanding of your point is now that you see consciousness from “hardware” (“natively”) and consciousness from “software” (learned in some way) as equal. Which kind of makes intuitive sense as the substrate shouldn’t matter.
Corollary: A social system (a corporation?) should also be able to be conscious if the structure is right.
From our state of knowledge about consciousness it’s indeed not impossible that modern LLMs are conscious. I wouldn’t say it’s likely, I definitely wouldn’t say that they are as likely to be conscious as uploaded humans. But the point stands. We don’t know for sure and we lack proper way to figure it out.
Previously we could’ve vaguely point towards Turing test, but we are past this stage now. Behavioral analysis of a model at this point is mostly unhelpful. A few tweaks can make the same LLM that previously confidently claimed not to be conscious, to swear that it’s conscious and is suffering. So what a current LLM says about the nature of its consciousness gives us about 0 bit of evidence.
This is another reason to stop making bigger models and spend a lot of time figuring out what we have already created. At some point we may create a conscious LLM, won’t be able to tell the difference and it would be a moral catastrophe.
I don’t think that in the example you give, you’re making a token-predicting transformer out of a human emulation; you’re making a token-predicting transformer out of a virtual system with a human emulation as a component. In the system, the words “what’s your earliest memory?” appearing on the paper are going to trigger all sorts of interesting (emulated) neural mechanisms that eventually lead to a verbal response, but the token predictor doesn’t necessarily need to emulate any of that. In fact, if the emulation is deterministic, it can just memorize whatever response is given. Maybe gradient descent is likely to make the LLM conscious in order to efficiently memorize the outputs of a partly conscious system, but that’s not obvious.
If you have a brain emulation, the best way to get a conscious LLM seems to me like it would be finding a way to tokenize emulation states and training it on those.
Should it make a difference? Same iterative computation.
Yes, I talked about optimizations a bit. I think you are missing a point of this example. The point is that if you are trying to conclude from the fact that this system is doing next token prediction then it’s definitely not conscious, you are wrong. And my example is an existence proof, kind of.
Not necessarily, a lot of information is being discarded when you’re only looking at the paper/verbal output. As an extreme example, if the emulated brain had been instructed (or had the memory of being instructed) to say the number of characters written on the paper and nothing else, the computational properties of the system as a whole would be much simpler than of the emulation.
I might be missing the point. I agree with you that an architecture that predicts tokens isn’t necessarily non-conscious. I just don’t think the fact that a system predicts tokens generated by a conscious process is reason to suspect that the system itself is conscious without some other argument.
No, I don’t think it would be “what the fuck” surprising if an emulation of a human brain was not conscious. I am inclined to expect that it would be conscious, but we know far too little about consciousness for it to radically upset my world-view about it.
Each of the transformation steps described in the post reduces my expectation that the result would be conscious somewhat. Not to zero, but definitely introduces the possibility that something important may be lost that may eliminate, reduce, or significantly transform any subjective experience it may have. It seems quite plausible that even if the emulated human starting point was fully conscious in every sense that we use the term for biological humans, the final result may be something we would or should say is either not conscious in any meaningful sense, or at least sufficiently different that “as conscious as human emulations” no longer applies.
I do agree with the weak conclusion as stated in the title—they could be as conscious as human emulations, but I think the argument in the body of the post is trying to prove more than that, and doesn’t really get there.
Well, it’s like saying if the {human in a car as a single system} is or is not conscious. Firstly it’s a weird question, because of course it is. And even if you chain the human to a wheel in such a way they will never disjoin from the car.
What I did is constrained possible actions of the human emulation. Not severely, the human still can talk whatever, just with constant compute budget, time or iterative commutation steps. Kind of like you can constrain actions of a meaty human by putting them in a jail or something. (… or in a time loop / repeated complete memory wipes)
How would you expect to this possibly cash out? Suppose there are human emulations running around doing all things exactly like meaty humans. How exactly do you expect that announcement of a high scientific council go, “We discovered that EMs are not conscious* because …. and that’s important because of …”. Is that completely out of model for you? Or like, can you give me (even goofy) scenario out of that possibility
Or do you think high resolution simulations will fail to replicate capabilities of humans, outlook of them? I.e special sauce/quantum fuckery/literal magic?