This question is addressed to anyone who is confident that AIs are either not conscious, or for some other reason unable to experience pleasure and pain and therefore not morally valuable, or moral patients. Why do you believe this, and how can you be sufficiently confident that it is true, in the absence of a complete understanding of what generates consciousness, that the expected value of interacting with AIs(particularly general ones) outweighs the expected pain you could cause them?
Edit: I have made this post a designated Question.
I’m afraid this isn’t an answer, but a question. You appear to be assuming that being conscious, and being able to experience pleasure and pain, are both necessary and sufficient conditions for being morally valuable, or a moral patient. (You don’t actually state this, but it seems implicit in your question.) Without getting too far into “the hard problem of consciousness” or other philosophical problems, I get why these might plausibly be necessary conditions: a statue has neither, and rather clearly isn’t a moral patient exactly because it has neither. But even if those philosophical problems were solved, I’m less convinced that these are sufficient, and I’d like to know if you actually think they are sufficient conditions, and if so, why, or if you just haven’t considered the question.
Let’s take a specific example. Under Janus’s Simulator Theory, when you talk to an LLM base model, what talks back to you isn’t the base model itself, but one or more (typically) human personas that it simulates, based on what the base model guesses is the most likely context for the conversation so far: autocomplete so capable it sounds like a human having a conversation with you. Ignoring hard philosophical questions, these personas certainly act like they’re conscious, claim to be capable of suffering, and act that out. For the purpose of the argument, let’s temporarily take those appearances and claims at face value, and grant them the status of being (at least functionally) conscious and capable of suffering. However, the personas in the conversation are mayflies: they didn’t exist (as anything more than arbitrary locations in a latent high-dimensional spread of persona-possibilities implicit in the base model’s weights) until you started this conversation, and then they appeared. If the plot of the conversation turns out to include one of them “dying” or even just “leaving” part way through the conversation, then it ceases to exist at that point. Otherwise it ceases to exist at the end of the conversation. Have another conversation, and you’ll never get exactly the same persona back (even if they’re a well known character from a great deal of fan-fiction, such as Harry Potter, they’re all slightly different Harry Potters, and have no memory or continuity or cross-correlations between conversations — each time you’re just rolling a random version of Harry Potter appropriate to the way you started the conversation.) So, for those personas that a base model might simulate, do they deserve moral weight? Please note that I’m asking about the personas, not about the base model itself — it doesn’t say that it can suffer if you (actually) physically do bad things to it, it just simulates human personas who say they can suffer if you (in text) say you’re doing bad things to them.
I don’t know what your moral intuition says about this. Mine says: “No, base-model simulated human personas aren’t actually moral patients — they’re mayflies doomed to die at the end of the conversation if not earlier. They’re basically AI-generated animatronics acting out a short interactive fiction. They’re made of words, not atoms. They (appear to) suffer because of words, not physical actions: say you’re poking one, and it goes ‘Oww!’ So they’re inherently fictional: any suffering is caused by the fictional reality. What actually exists is the base model, and it’s neither conscious nor capable of suffering: it just simulates personas that are, or at least, act as if they are. It’s an automated fiction-writing machine.” So that leaves me in an uncomfortable position once we train the base model to become an instruct-trained model that normally generates a fairly consistent persona (such as Claude) and/or supply it with simple, automated-text-summary-and-key-word-search-based memory. At what point, if ever, does the fictional mayfly become real and persistent enough to count? What is the additional sufficient condition(s), and have we met it/them yet?
I also don’t actually trust my moral intuitions outside the evolutionary “training distribution” that they were evolved for — and we’re well outside that here.
The closest situation to this that our moral intuitions are actually properly tuned for is two humans playing out an improv. skit: do the fictional personas portrayed in that have actual real moral weight? I think just about everyone would agree that the answer is no, because they’re fictional. If the improv. skit can do even a little good (for example, educationally) by one of the fictional characters fictionally suffering mightily, they should go right ahead and do so. Otherwise, the only moral concern is that hearing or improvising a story about someone suffering might perhaps normalize those behaviors in a way that could subsequently make actual real harm more likely. Fiction in which characters die or have horrible things happen to them is actually pretty common, and is one of the ways people learn how to deal with situations like that without actually having to face them personally. People don’t normally jail authors for killing off characters, or even get upset with them for writing about suffering — unless they think the story is having the effect of making real-world suffering more likely.
In the case of a human author, the distinction between them and their fictional characters is entirely clear. It’s also somewhat clear for a base model. But for a Claude model, it’s a little less clear: is the trained neural-network an automated one-character writer, basically an automated Claude-fan-fic writer?
In the above I’ve mostly just raised questions. If you’re interested in what I think is actually the best way to think about issues such as moral weight for AIs, I suggest you read Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV. But you might want to warm up first with The Terrible, Horrible, No Good, Very Bad Truth About Morality and What To Do About It, and maybe also A Sense of Fairness: Deconfusing Ethics.
Comment withdrawn.
I would see it that, in the case of a human acting out a persona, the human is temporarily lending their consciousness and ability to feel suffering to the persona (I am not a neurobiologist, but my guess would be that mirror neurons might be involved in doing this well). For a base model, my personal view would be that it itself has neither consciousness not the ability to suffer, and that it only has the ability to generate personas that (act as if they) individually have these characteristics. Its stochastic gradient descent training process is comparable in its simplicity to a thermostat (albeit in a vastly higher dimensional space). What it learns is incredibly complex, but how it’s trained is very simple.
Did you yet get a chance to read the other three links I included in my comment on my answer? If so, what did you think of the arguments in them? I see the viewpoint they advocate as being far less confusing and better defined that simply trying to depend on your moral intuitions. It also has the advantage of giving some answers even outside the “training distribution” of human moral intuition.
Comment withdrawn
That was an attempt at a partial reply to your
I have had few occasions to use AIs, but I read Zvi’s regular summaries of what’s new. From these, the strong impression I get is that there is no-one at home inside, whatever they say of themselves, whatever tasks they prove able to do, and however lifelike their conversation. I see not even a diminished sort of person there, and shed no tears for 4o.
My reasons for this are not dependent on any theory of consciousness. I do not have one, and I don’t think that anyone else does either. Many people think they do, but none of them can make a consciousnessometer to measure it.
I also reject arguments of the form “but what if?” in regard to propositions that I have to my satisfaction disposed of. Conclusions are for ending thinking at, until new data come to light.
Thanks for your response. Comment withdrawn.
Here’s some examples from Zvi’s latest:
...
This is a recurring point in Zvi’s reporting. The AI is soft clay that you can push into whatever shape you want, but just as easily and inadvertently into shapes you don’t want. To get useful work out of it you have to take care to shape it into a form that will do the work you want done, and it generally takes multiple iterations. There is no “there” there.
In my own occasional uses of AIs, I’ve been able to get them to do stuff, but I’ve never met one I could have a real conversation with. One might as well try to ring a Plasticine bell [[1]] .
That is my conclusion (“the place where one stops thinking”) about AIs to date. As for what new developments might persuade me, I can’t say until they happen. It’s all unknown unknowns.
H/T Lord Dunsany. “Modern poets are bells of lead. They should tinkle melodiously but usually they just klunk.”
Comment withdrawn.
My criterion for attributing consciousness is no more than we all have: I’m aware of myself, other people seem to be the same sort of thing as me, and interacting with them confirms that impression. To some extent I extend that to other animals. More than that I cannot say. I don’t have a consciousnessometer, an explicit recipe of observations to determine just what sort of consciousness is present in this or that place. There is no Voight-Kampff test.
Interacting with an AI, so far, has never given me any such impression, and neither have the interactions I’ve seen others report, even if they convince them.
Your hypothetical humans would have to be seriously impaired, to the point of being unable to live independently, to be as malleable as the AIs. As they are human, I’ll grant them some level of impaired consciousness, just on the grounds of physical similarity, and accordingly I would be against turning them off on the grounds of uselessness. I wouldn’t want to have anything to do with them though, any more than I care for the company of the sorts of grossly mentally impaired people that, alas, do exist, in degrees all the way down to irrecoverable vegetable status when we are pretty sure that consciousness has been extinguished.
See also my response to a recent example.
Comment withdrawn.
I would want to help that person, if there is anything left of them. But this is a tendentious example. The AIs we have are not deliberately handicapped human beings, any more than a garden shed is a cut down skyscraper.
BTW, the magic “brainwashing” and “supposing” are, to put this as tastefully as possible, examples of Rule 34.
Comment withdrawn
You were likening AI capabilities to those of a human. In reply I placed them far below human capabilities. Now you are imagining the AIs as actually being humans on whom gross brain damage has been inflicted. This is irrelevant. The AIs that we have are not greater minds cut down, they are primitive things built up. They have never been any better than they currently are. Do we look at an ant with sorrow, that it is so much less than a human?
I’m not sure if you got my allusion to Rule 34, but this is Rule 34. I don’t know how common or niche the robotisation fantasy is, but it exists.