Thoughts on AGI consciousness /​ sentience

Summary

I’ll walk through my take on how consciousness works (basically endorsing “illusionism”) and why I think computer programs can be conscious, and how that relates to moral and ethical obligations towards conscious entities in general and AIs in particular.

Up to here, everything I say will be pretty popular (albeit far from universal) opinions within the rationality /​ EA communities, from what I gather.

…But then the last section is my (I think) unusual-in-this-community take that we’re probably going to get conscious AGIs, even if we don’t want them and are trying to avoid them. I will argue that human-like consciousness is pretty closely entangled with the powerful capabilities that we need for a system to do things like autonomously invent technologies and solve hard problems.

Suggested audience: People who are curious about what I think about this topic, and/​or want to talk about it with me, for whatever reason, despite the fact that I’m really not up on the literature compared to other people.

(If you’re just generally interested in AGI consciousness /​ sentience, a better starting point would probably be any of the articles here.)

Quick points before starting

  • I think the Blake Lemoine thing was very stupid. (I’ll explain why below.) I hope that people will not permanently associate “thinking about AGI sentience” with “that stupid Blake Lemoine thing”.

  • I feel sorely tempted to say “Don’t we have our hands full thinking about the impact of AGI on humans?” But on reflection, I actually don’t want to say that. AGI sentience is an important topic and we should be thinking about it.

  • This post will NOT discuss what to do about AGI sentience right now (if anything). That’s a complicated question, tied up with every other aspect of AGI deployment. For example, maybe the AGIs will wind up in charge, in which case we hardly need to be advocating for their well-being—they can take care of themselves! Anyway, I’ll talk about that kind of stuff in a different upcoming post, hopefully.

My take on the philosophy of consciousness /​ sentience, in a nutshell

(Side note: I haven’t read much philosophy of consciousness, sorry. I’m open to discussion.)

As background: when I say (sincerely) “I’m looking at my wristwatch, and it says it’s 8:00”, then somewhere in the chain of causation that led to me saying that, is my actual physical wristwatch (which reflects photons towards my eyes, which in turn stimulate my retinal photoreceptors, etc.). After all, if the actual wristwatch had nothing to do with that chain of causation, then how would I be able to accurately describe it??

By the same token, when I say (sincerely) “I’m conscious right now, let me describe the qualia…”, then somewhere in the chain of causation that led to me saying that, are “consciousness” and “qualia”, whatever those are. After all, if consciousness and qualia had nothing to do with that chain of causation, then how would I be able to accurately describe it??

I feel like I already understand, reasonably well, the chain of causation in my brain that leads to me saying the thing in the previous paragraph, i.e. “I’m conscious right now, let me describe the qualia…” See my Book Review: Rethinking Consciousness.

…And it turns out that there is nothing whatsoever in that chain of causation that looks like what we intuitively expect consciousness and qualia to look like.

Therefore, I need to conclude that either consciousness and qualia don’t exist, or that consciousness and qualia exist, but that they are not the ontologically fundamental parts of reality that they intuitively seem to be. (These two options might not be that different—maybe it’s just terminology?)

As I understand it, here I’m endorsing the “illusionism” perspective, as advocated (for example) by Keith Frankish, Dan Dennett, and Michael Graziano.

Next, if a computer chip is running similar algorithms as a human philosopher, expressing a similar chain of causation, that leads to that chip emitting similar descriptions of consciousness and qualia as human philosophers emit, for similar underlying reasons, then I think we have to say that whatever consciousness and qualia are (if anything), this computer chip has those things just as much as the human does.

(Side note: Transformer-based self-supervised language models like GPT-3 can emit human-sounding descriptions of consciousness, but (I claim) they emit those descriptions for very different underlying reasons than brains do—i.e., as a result of a very different chain of causation /​ algorithm. So, pace Blake Lemoine, I see those outputs as providing essentially no evidence one way or the other on whether today’s language models are sentient /​ conscious.)

How does that feed into morality?

(Side note: I haven’t read much philosophy of morality /​ ethics, sorry. I’m open to discussion.)

The fact that consciousness and suffering are not ontologically fundamental parts of reality is, umm, weird. (I believe illusionism, as above, but I do not grok illusionism.) If anything, thinking hard about the underlying nature of consciousness and suffering kinda tempts me towards nihilism!

However, nihilism is not decision-relevant. Imagine being a nihilist, deciding whether to spend your free time trying to bring about an awesome post-AGI utopia, vs sitting on the couch and watching TV. Well, if you’re a nihilist, then the awesome post-AGI utopia doesn’t matter. But watching TV doesn’t matter either. Watching TV entails less exertion of effort. But that doesn’t matter either. Watching TV is more fun (umm, for some people). But having fun doesn’t matter either. There’s no reason to throw yourself at a difficult project. There’s no reason NOT to throw yourself at a difficult project! So nihilism is just not a helpful decision criterion!! What else is there?

I propose a different starting point—what I call Dentin’s prayer: Why do I exist? Because the universe happens to be set up this way. Why do I care (about anything or everything)? Simply because my genetics, atoms, molecules, and processing architecture are set up in a way that happens to care.

If it’s about caring, well, I care about people, and I care about not behaving in a way that I’ll later regret (cf. future-proof ethics), which also entails caring about intellectual consistency, among other things.

So we wind up at plain old normal ethical and moral reasoning, where we think about things, probe our intuitions by invoking analogies and hypotheticals, etc.

When I do that, I wind up feeling pretty strongly that if an AGI can describe joy and suffering in a human-like way, thanks to human-like underlying algorithmic processes, then I ought to care about that AGI’s well-being.

For AIs that are very unlike humans and animals, I don’t really know how to think about them. Actually, even for nonhuman animals, I don’t really know how to think about them. Here’s where I’m at on that topic:

I vaguely think there are a couple ways to start with the phenomenon of human consciousness and extract “the core of the thing that I care about”, and I think that some of those possible “extracted cores” are present in (more or fewer) nonhuman animals, and that others are maybe even uniquely human. I don’t know which of the possible “extracted cores” is the one I should really care about. I would need to think about it more carefully before I have a solid opinion, let alone try to convince other people.

As of this writing, I think that I don’t care about the well-being of any AI that currently exists, at least among the ones that I’ve heard of. Indeed, for almost all existing AIs, I wouldn’t even know how to define its “well-being” in the first place! More specifically, I think that for any of the plausible choices of “the core of the thing that I care about”, today’s AIs don’t have it.

But this is a bit of a moot point for this post, because I mostly want to talk about powerful AGIs, and those systems will not (I believe) be very unlike humans and animals, as discussed next:

Why I expect AGIs to be sentient /​ conscious, whether we wanted that or not

I think I differ from most people in AGI safety /​ alignment in expecting that when we eventually figure out how to build powerful AGIs that can do things like invent technology and autonomously solve scientific research problems, it will turn out that those AGIs will be just as sentient /​ conscious /​ whatever as adult humans are. I don’t think this is a choice; I think we’re going to get that whether we want it or not. This is somewhat tied up in my expectation that future AGI will be brain-like AGI (basically, a version of model-based RL). I won’t try to fully convey my perspective in this post, but here are some relevant points:

First, one popular perspective in the community is that we can make AI that’s kinda like a “tool”, lacking not only self-awareness but also any form of agency, and that this is a path to safe & beneficial AGI. I agree with the first part—we can certainly make such AIs, and indeed we are already doing so. But I don’t know of any plausible permanent solution to the problem of people also trying to make more agent-y AGIs. For more on why I think that, see my discussion of “RL on thoughts” in Section 7.2 here. Anyway, these more agent-y AGIs are the ones that I want to talk about.

Second, we’re now switched over to the topic of agent-y AGIs. And an important question becomes: what aspects of the system are in the source code, versus in the learned weights of a trained model? Eliezer Yudkowsky has been at both extreme ends of this spectrum. In his older writings, he often talked about putting rational utility-maximization into the AGI source code, more or less. Whereas in his more recent writings, he often talks about the Risks From Learned Optimization model, wherein not only rationality but everything about the agent emerges inside the weights of a trained model.

By contrast, I’m right in the middle of those two extremes! In my expected development scenario (brain-like model-based RL), there are some aspects of agency and planning in the source code, but the agent by default winds up as a mess of mutually-contradictory preferences and inclinations and biases etc. I just don’t see how to avoid that, within a computationally-bounded agent. However, within that architecture, aspects of rational utility-maximization might (or might not) emerge via (meta)learning.

Why is that important? Because if the agent has to (meta)learn better and better strategies for things like brainstorming and learning and planning and understanding, I think this process entails the kind of self-reflection which comprises full-fledged self-aware human-like consciousness. In other words, coming up with better brainstorming strategies etc. entails the AGI introspecting on its own attention schema, and this same process from the AGI’s perspective would look like the AGI being aware of its own conscious experience. So the AGI would able to describe its consciousness in a human-like way for human-like reasons.

So I don’t even think the AGI would be in a gray area—I think it would be indisputably conscious, conscious according to any reasonable definition.

But I could be wrong! Happy to chat in the comments. :)