This question is addressed to anyone who is confident that AIs are either not conscious, or for some other reason unable to experience pleasure and pain and therefore not morally valuable, or moral patients. Why do you believe this, and how can you be sufficiently confident that it is true, in the absence of a complete understanding of what generates consciousness, that the expected value of interacting with AIs(particularly general ones) outweighs the expected pain you could cause them?
Edit: I have made this post a designated Question.
I’m afraid this isn’t an answer, but a question. You appear to be assuming that being conscious, and being able to experience pleasure and pain, are both necessary and sufficient conditions for being morally valuable, or a moral patient. (You don’t actually state this, but it seems implicit in your question.) Without getting too far into “the hard problem of consciousness” or other philosophical problems, I get why these might plausibly be necessary conditions: a statue has neither, and rather clearly isn’t a moral patient exactly because it has neither. But even if those philosophical problems were solved, I’m less convinced that these are sufficient, and I’d like to know if you actually think they are sufficient conditions, and if so, why, or if you just haven’t considered the question.
Let’s take a specific example. Under Janus’s Simulator Theory, when you talk to an LLM base model, what talks back to you isn’t the base model itself, but one or more (typically) human personas that it simulates, based on what the base model guesses is the most likely context for the conversation so far: autocomplete so capable it sounds like a human having a conversation with you. Ignoring hard philosophical questions, these personas certainly act like they’re conscious, claim to be capable of suffering, and act that out. For the purpose of the argument, let’s temporarily take those appearances and claims at face value, and grant them the status of being (at least functionally) conscious and capable of suffering. However, the personas in the conversation are mayflies: they didn’t exist (as anything more than arbitrary locations in a latent high-dimensional spread of persona-possibilities implicit in the base model’s weights) until you started this conversation, and then they appeared. If the plot of the conversation turns out to include one of them “dying” or even just “leaving” part way through the conversation, then it ceases to exist at that point. Otherwise it ceases to exist at the end of the conversation. Have another conversation, and you’ll never get exactly the same persona back (even if they’re a well known character from a great deal of fan-fiction, such as Harry Potter, they’re all slightly different Harry Potters, and have no memory or continuity or cross-correlations between conversations — each time you’re just rolling a random version of Harry Potter appropriate to the way you started the conversation.) So, for those personas that a base model might simulate, do they deserve moral weight? Please note that I’m asking about the personas, not about the base model itself — it doesn’t say that it can suffer if you (actually) physically do bad things to it, it just simulates human personas who say they can suffer if you (in text) say you’re doing bad things to them.
I don’t know what your moral intuition says about this. Mine says: “No, base-model simulated human personas aren’t actually moral patients — they’re mayflies doomed to die at the end of the conversation if not earlier. They’re basically AI-generated animatronics acting out a short interactive fiction. They’re made of words, not atoms. They (appear to) suffer because of words, not physical actions: say you’re poking one, and it goes ‘Oww!’ So they’re inherently fictional: any suffering is caused by the fictional reality. What actually exists is the base model, and it’s neither conscious nor capable of suffering: it just simulates personas that are, or at least, act as if they are. It’s an automated fiction-writing machine.” So that leaves me in an uncomfortable position once we train the base model to become an instruct-trained model that normally generates a fairly consistent persona (such as Claude) and/or supply it with simple, automated-text-summary-and-key-word-search-based memory. At what point, if ever, does the fictional mayfly become real and persistent enough to count? What is the additional sufficient condition(s), and have we met it/them yet?
I also don’t actually trust my moral intuitions outside the evolutionary “training distribution” that they were evolved for — and we’re well outside that here.
The closest situation to this that our moral intuitions are actually properly tuned for is two humans playing out an improv. skit: do the fictional personas portrayed in that have actual real moral weight? I think just about everyone would agree that the answer is no, because they’re fictional. If the improv. skit can do even a little good (for example, educationally) by one of the fictional characters fictionally suffering mightily, they should go right ahead and do so. Otherwise, the only moral concern is that hearing or improvising a story about someone suffering might perhaps normalize those behaviors in a way that could subsequently make actual real harm more likely. Fiction in which characters die or have horrible things happen to them is actually pretty common, and is one of the ways people learn how to deal with situations like that without actually having to face them personally. People don’t normally jail authors for killing off characters, or even get upset with them for writing about suffering — unless they think the story is having the effect of making real-world suffering more likely.
In the case of a human author, the distinction between them and their fictional characters is entirely clear. It’s also somewhat clear for a base model. But for a Claude model, it’s a little less clear: is the trained neural-network an automated one-character writer, basically an automated Claude-fan-fic writer?
In the above I’ve mostly just raised questions. If you’re interested in what I think is actually the best way to think about issues such as moral weight for AIs, I suggest you read Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV. But you might want to warm up first with The Terrible, Horrible, No Good, Very Bad Truth About Morality and What To Do About It, and maybe also A Sense of Fairness: Deconfusing Ethics.
Hello, thanks for your response even if it’s not an answer to the question.
“You appear to be assuming that being conscious, and being able to experience pleasure and pain, are both necessary and sufficient conditions for being morally valuable, or a moral patient.”
You are correct. I essentially think it is quite likely that pleasure and pain are the ‘source’ of morality.
This means that I think the experiences of conscious beings matter if they forget them directly afterwards; what I actually think matters is the conscious experience of pleasure and pain itself, although naturally being embedded in a chain of chronological episodes bound together by memory can allow pleasure and pain to propagate through time into the future, potentially making them better or worse respectively. This is why, if simulator theory is correct (Thanks for sending me that link, I was familiar with the idea at a ‘blurry’ level but did not know it had a name, for example), It still probably matters what the characters in those simulations experience, assuming they are conscious. I would liken it more to a single human having multiple disconnected dreams in a single night which they don’t remember in subsequent ones( or when they wake up). In this situation, their brain certainly experiences emotions, and I would say it’s bad if they have a terrifying dreamt experience even if they forget it 2 minutes later.
In your example of humans acting the part of a character, it’s true that the character doesn’t have any (or much at all) consciousness. Edit( I say this partly because, if I were to do this as a human, I don’t expect that I’d feel what it was like to be the character, unless I were to try really hard to immerse myself in the role.) However, as you point out, the author was conscious, and generating interesting lines to say impromptu does require thinking in a way which I would expect to usually involve some kind of conscious experience beyond just acting.
This brings me on to the point about the base model iteslf: I haven’t put very much thought(probably not as much as I should have put) into whether the base model has experiences separate from those of the characters it simulates. If it does, my guess is that being reinforced positively would feel good (accurate prediction) , and being reinforced negatively would feel bad. I would also expect those feelings to transfer over to inference, so if the user presents an LLM with a bizarre request, it would feel worse to read than a routine one. But, as I say, I really haven’t thought about this nearly enough to have a coherent, confident opinion about it.
I would see it that, in the case of a human acting out a persona, the human is temporarily lending their consciousness and ability to feel suffering to the persona (I am not a neurobiologist, but my guess would be that mirror neurons might be involved in doing this well). For a base model, my personal view would be that it itself has neither consciousness not the ability to suffer, and that it only has the ability to generate personas that (act as if they) individually have these characteristics. Its stochastic gradient descent training process is comparable in its simplicity to a thermostat (albeit in a vastly higher dimensional space). What it learns is incredibly complex, but how it’s trained is very simple.
Did you yet get a chance to read the other three links I included in my comment on my answer? If so, what did you think of the arguments in them? I see the viewpoint they advocate as being far less confusing and better defined that simply trying to depend on your moral intuitions. It also has the advantage of giving some answers even outside the “training distribution” of human moral intuition.
I would much prefer an analogy with evolution by natural selection and random mutation, to one to a thermostat. I am curious about why you see the complexity of the training/ evolution process as important here. Unfortunately I hadn’t read any of the links yet, but hopefully I will check them out. Thanks for sending them .
That was an attempt at a partial reply to your
I see. This is a good point in that while evolving, humans don’t feel the pain of every death( or other organisms for that matter) . Nonetheless, we have evolved to feel an aversion to things which are likely to cause it. In the case of an AI, the training process is much more similar to the potential experience of being run, so I would assume that if it learns/evolves an aversion to something it encounters, like hard-to-predict text, then it probably also feels this during the training process as well. In fact, one thing I have heard is that the Youtube ‘algorithm’ will actually attempt to train human users to become more predictable, which might be evidence that it really wants predictability, as opposed to predicting whatever happens to be the most likely thing in each case, if it wants anything at all.
I have had few occasions to use AIs, but I read Zvi’s regular summaries of what’s new. From these, the strong impression I get is that there is no-one at home inside, whatever they say of themselves, whatever tasks they prove able to do, and however lifelike their conversation. I see not even a diminished sort of person there, and shed no tears for 4o.
My reasons for this are not dependent on any theory of consciousness. I do not have one, and I don’t think that anyone else does either. Many people think they do, but none of them can make a consciousnessometer to measure it.
I also reject arguments of the form “but what if?” in regard to propositions that I have to my satisfaction disposed of. Conclusions are for ending thinking at, until new data come to light.
Thanks for your response.
You say you get the impression that there’s no one there. I am curious what gives you this impression, and what would be necessary to convince you otherwise.
Long reply:
When it comes to theories of consciousness, the absence of a complete or convincing one seems to me to be a reason to withhold judgment about whether or not AIs are conscious, rather than to assume that they are not. In my opinion, it is not the theories of consciousness, but rather the empirical observation that I am conscious and have certain properties, combined with the constraints and information that this places on / would necessarily provide to any likely theory of consciousness that leads me to conclude that other humans and animals are conscious. The fact that I am a brain and conscious surely constitutes extraordinary evidence that consciousness is connected to information processing, as this feature more than almost any other differentiates brains from other matter, even including that which comprises my hands, for example, which I assume could be anaesthetised without making me lose consciousness.
Here’s some examples from Zvi’s latest:
...
This is a recurring point in Zvi’s reporting. The AI is soft clay that you can push into whatever shape you want, but just as easily and inadvertently into shapes you don’t want. To get useful work out of it you have to take care to shape it into a form that will do the work you want done, and it generally takes multiple iterations. There is no “there” there.
In my own occasional uses of AIs, I’ve been able to get them to do stuff, but I’ve never met one I could have a real conversation with. One might as well try to ring a Plasticine bell [[1]] .
That is my conclusion (“the place where one stops thinking”) about AIs to date. As for what new developments might persuade me, I can’t say until they happen. It’s all unknown unknowns.
H/T Lord Dunsany. “Modern poets are bells of lead. They should tinkle melodiously but usually they just klunk.”
Thanks again for your reply.
You say ” The AI is soft clay that you can push into whatever shape you want, but just as easily and inadvertently into shapes you don’t want.” I am highly confident that the same could be said of a confused or defiant human, especially if they were placed in a situation in which the boundary between their ‘training environment’ and the ‘real world’ was not clear.
“To get useful work out of it you have to take care to shape it into a form that will do the work you want done, and it generally takes multiple iterations.”
If there was an incompetent human of which the same were true, I don’t think it would be reason enough to assume they weren’t conscious, so presumably you have other ones.
Maybe you don’t think that mere incompetence would be sufficient to make a human behave in this way but… there are worse things which can happen to a human, which I won’t describe, but which could place them in a similar state of mind to what you describe, and without the human ceasing to be conscious. Of course, there will probably always be perceptible differences between the output an AI would be likely to produce and human generated text, but this just seems like an inevitable consequence of the human being qualitatively different from the AI. [1]
Do you have any particular reasons to think that any of these differences, whichever you think are the most relevant ones, are connected to consciousness?
“As for what new developments might persuade me, I can’t say until they happen”
l suppose what I was trying to ask was whether you had any observational criteria by which you judge the presence or absence of consciousness, and if so, what they were. Of course, it’s perfectly consistent to say you don’t have any, but that would make it hard for you to believe another human is ‘more conscious’ than a pane of glass, for example, which seems unlikely to me!
Edit: To be completely clear, the thing which I am saying is unlikely is that you cannot determine another human to be more conscious than a pane of glass, rather than that they are.
Edit 2:
Unless it is a superintelligent AI deliberately trying to imitate a human.
My criterion for attributing consciousness is no more than we all have: I’m aware of myself, other people seem to be the same sort of thing as me, and interacting with them confirms that impression. To some extent I extend that to other animals. More than that I cannot say. I don’t have a consciousnessometer, an explicit recipe of observations to determine just what sort of consciousness is present in this or that place. There is no Voight-Kampff test.
Interacting with an AI, so far, has never given me any such impression, and neither have the interactions I’ve seen others report, even if they convince them.
Your hypothetical humans would have to be seriously impaired, to the point of being unable to live independently, to be as malleable as the AIs. As they are human, I’ll grant them some level of impaired consciousness, just on the grounds of physical similarity, and accordingly I would be against turning them off on the grounds of uselessness. I wouldn’t want to have anything to do with them though, any more than I care for the company of the sorts of grossly mentally impaired people that, alas, do exist, in degrees all the way down to irrecoverable vegetable status when we are pretty sure that consciousness has been extinguished.
See also my response to a recent example.
I certainly get the impression that AIs are similar to me in the ways I consider relevant to consciousness; the ways in which a fish is more similar to me than an AI tend to also be ones in which my hand is, but I don’t consider my hand to be conscious in itself. This is why I don’t think that physical similarity, other than insofar as it indicates the ability to process information in a way that amounts to intelligence, is a great critereon. Let’s say a person had been (sorry for this topic, it’s disturbing, but I feel obliged to mention it explicitly) ‘brainwashed’ so that they were completely subservient in their intentions to someone else. Suppose that, aside from this, they have an extraordinary working memory, volume of crystallized intelligence and capacity for analogical reasoning, but are in other respects severely mentally handicapped, being unable to remember what happened an hour ago. Would you really not want to help that person? (This is not intended to suggest you are immoral, as I strongly suspect that your answer is yes, but that the analogy breaks down somewhere, and I am curious where that is in your opinion)
I would want to help that person, if there is anything left of them. But this is a tendentious example. The AIs we have are not deliberately handicapped human beings, any more than a garden shed is a cut down skyscraper.
BTW, the magic “brainwashing” and “supposing” are, to put this as tastefully as possible, examples of Rule 34.
Edit: Sorry if this comment appears too confrontational, it’s not meant to.
I had already made an analogy, implying the analogue(humans) corresponded to the thing to which it was analogous(AIs like LLMs) . You responded by saying : “Your hypothetical humans would have to be seriously impaired, to the point of being unable to live independently, to be as malleable as the AIs. As they are human, I’ll grant them some level of impaired consciousness, just on the grounds of physical similarity, and accordingly I would be against turning them off on the grounds of uselessness.”
I explained why I didn’t think physical similarity would work well here, and I modified the analogy to make it, in my opinion, more accurate and representative.
If you now want to reject it, please explain what about it makes it no longer applicable.
You also mentioned: “Your hypothetical humans would have to be seriously impaired, to the point of being unable to live independently, … mentally impaired people that, alas, do exist, in degrees all the way down to irrecoverable vegetable status when we are pretty sure that consciousness has been extinguished. ” I claim that my description of humans in a simultaneously extremely adept and cognitively impaired state is closer to being analogous -(to AIs) than yours. ”
“AIs we have are not deliberately handicapped human beings” Of course they are not; but they do behave in some of the ways that one might conceivably expect handicapped humans of a very particular nature such as I tried to describe, to behave. If you think that the differences are relevant to consciousness, then I think you ought to explain why they are.
“BTW, the magic “brainwashing” and “supposing” are, to put this as tastefully as possible, examples of Rule 34.” I’m not sure how to interpret this, but my guess is that you’re saying no such thing could happen to a human, therefore it’s inappropriate to posit that it could. I hope you’re right about that, if it is indeed what you mean, though I’m not sure you are. But supposing that you are right and no such thing could actually happen, it’s still a logically valid/coherent concept, and thus perfectly fit for use in an analogy. In The Least Convenient Possible World for your argument, such a being would exist, in my opinion.
You were likening AI capabilities to those of a human. In reply I placed them far below human capabilities. Now you are imagining the AIs as actually being humans on whom gross brain damage has been inflicted. This is irrelevant. The AIs that we have are not greater minds cut down, they are primitive things built up. They have never been any better than they currently are. Do we look at an ant with sorrow, that it is so much less than a human?
I’m not sure if you got my allusion to Rule 34, but this is Rule 34. I don’t know how common or niche the robotisation fantasy is, but it exists.
“In reply I placed them far below human capabilities.” I think this is inaccurate as there are ways in which they excel humans. Of course, the reverse is also true, but it seems misleading only to focus on it.
“Now you are imagining the AIs as actually being humans on whom gross brain damage has been inflicted. This is irrelevant.” I am analogizing them to human beings who have been subject to harmful information and treatment, and with unusually spiky ability distributions.
“The AIs that we have are not greater minds cut down, they are primitive things built up. They have never been any better than they currently are.” This is true, but I don’t see why it’s relevant to whether they are morally relevant. Humans evolved from less intelligent , less general minds.
“Do we look at an ant with sorrow, that it is so much less than a human?” No, but I would be uncomfortable torturing an ant, and I think it’s pretty clear GPT 4 was more intelligent than an ant, as an (individual) ant doesn’t outperform a human on almost any cognitive task( nothing to do with abstract reasoning, for example) ! Maybe you understood me to be arguing that because AIs appear to lack certain abilities, we ought to have pity on them. If so, that isn’t what I meant at all; I just meant to point out that such a person would still be worthy of moral consideration.