Goal-thinking vs desire-thinking

[Adapted from an old post on my personal blog]

There’s a lot of long-running arguments on the internet that basically consist of people arguing past each other due to differing basic assumptions that they don’t know how to make explicit, preventing them from noticing the fundamental disagreement. I’ve noticed a few of these and tried to see if I can make both sides more explicit. In this post I’d like to try to explicate one.

Let’s start with a concrete example; there are a number of people who would say that wireheading is a good thing, which is obviously not the general thinking on LW. What’s the source of this disagreement? One possible explanation would be to say that the former are saying “happiness is our only terminal value, all other values are subsidiary to it”, while the latter say hell no it’s not, but I think there’s more to it than that.

Without yet saying what I think the fundamental distinction is, let me give another example that I think stems from the same disagreement. Consider this essay—and this isn’t the only thing I’ve seen along these lines—which takes the point of view that obviously a rational person would kill themselves, while to me this just seems… dumb.

So what’s going on here? What’s the actual distinction that leads to such arguments? Again, I can’t know, but here’s my hypothesis. I think there are two sorts of thinking going on here; I’m going to call them “goal-thinking” and “desire-thinking” (these are my own terms, feel free to devise better ones).

So—goal thinking is thinking in terms of what I’m calling “goals”. Goals are to be accomplished. If you’re thinking in terms of goals, what you’re afraid of is being thwarted, or having your capacity to act, to effect your goals, reduced—being somehow disabled or restrained; if your capabilities are reduced, you have less ability to make an effect on the future and steer it towards what you want. (This is important; goal-thinking thinks in terms of preferences about the future.) The ultimate example of this is death—if you’re dead, you can’t affect anything anymore. While it’s possible in some unusual cases that dying could help accomplish your goals, it’s pretty unlikely; most of the time, you’re better off remaining alive so that you can continue to affect things. So suicide is almost always unhelpful. Goals, remember, about the world, external to oneself.

Wireheading is similarly disastrous, because it’s just another means of rendering oneself inactive. We can generalize “wireheading” of course to anything that causes one to think one has accomplished one’s goals when one hasn’t. Or of course to having one’s goals altered. We all know this argument; this is just the old “murder pill” argument. Indeed, you’ve likely noticed by this point that I’m just recapitulating Omohundro’s basic AI drives.

Another way of putting this is, goals themselves are driving forces.

So what’s the alternative, “desire-thinking”, that I’m claiming is how many people think? One answer would be to say, this alternative way of thinking is that “it’s all about happiness vs unhappiness” or “it’s all about pleasure vs pain”, thinking in terms of internal experience rather than the external state of the world—so for instance, people thinking this way tend to focus on unhappiness, pain, and suffering as the general bad thing, rather than having one’s capacity to act reduced.

But, as I basically already said above, I actually don’t think this gets at the root of the distinction, because there are still things this fails to explain. For instance, I think it fails to explain the suicide article above, or, say, Buddhism; since applying the goal-thinking point of view but applied to internal experiences instead would just lead to hedonism instead. And presumably there are a number of people thinking that way! (Which may include a number of the “wireheading is good” people.) But we can basically group this in as a variant of goal-thinking. How do we explain the truly troublesome cases above, that don’t fit into this?

I think what’s actually going on with these cases involves not thinking in terms of goals in the above sense at all, but rather what I’m calling “desires” instead. The distinction is that whereas goals are to be accomplished, desires are to be extinguished. From a goal-thinking point of view, you can model this as having one single goal, “extinguish all desires”, which is the only driving force; and the desires themselves are, just, like, objects in the model, not themselves driving forces.

So under the desire-thinking point of view, having one’s desires altered can be a good thing, if the new ones are easier. If you can just make yourself not care, great. Wireheading is excellent from this point of view, and even killing oneself can work. Indeed, desire-thinking doesn’t really think in terms of preferences about the future, so much as just an anticipation of having preferences in the future (about the then-present).

Now while I, and LW more generally, may sympathize more with the former point of view, it’s worth noting that in reality nobody uses entirely one or the other. Or at least, it seems pretty clear that even here people won’t actually endorse pure goal-thinking for humans (although it’s another matter for AIs; this is one of those times when it’s worth remembering that LW really has two different functions—refining the art of human rationality, and refining the art of AI rationality, and that these are not always the same thing). While I don’t have a particular link on-hand, this issue has often been discussed here before in terms of preference regarding flavors of ice cream, and how it’s not clear that one should resist modifications to this; this can be explained if one imagines that desire-thinking should be applied to such cases.

Thus when Eliezer Yudkowsky says “I wouldn’t want to take a pill that would cause me to want to kill people, because then maybe I’d kill people, and I don’t want that”, we recognize it as an important principle of decision theory; but when someone says “I don’t like spinach, and I’m glad I don’t, because if I liked it I’d eat it, and I just hate it”, we correctly recognize this as a joke. (Despite it being isomorphic.) Still, despite people not actually being all one way or the other, I think it’s a useful way of understanding some arguments that have resulted in a lot of people talking past each other.