CEV: a utilitarian critique

I’m posting this article on behalf of Brian Tomasik, who authored it but is at present too busy to respond to comments.

Update from Brian: “As of 2013-2014, I have become more sympathetic to at least the spirit of CEV specifically and to the project of compromise among differing value systems more generally. I continue to think that pure CEV is unlikely to be implemented, though democracy and intellectual discussion can help approximate it. I also continues to feel apprehensive about the conclusions that a CEV might reach, but the best should not be the enemy of the good, and cooperation is inherently about not getting everything you want in order to avoid getting nothing at all.”


I’m often asked questions like the following: If wild-animal suffering, lab universes, sentient simulations, etc. are so bad, why can’t we assume that Coherent Extrapolated Volition (CEV) will figure that out and do the right thing for us?


Most of my knowledge of CEV is based on Yudkowsky’s 2004 paper, which he admits is obsolete. I have not yet read most of the more recent literature on the subject.

Reason 1: CEV will (almost certainly) never happen

CEV is like a dream for a certain type of moral philosopher: Finally, the most ideal solution for discovering what we really want upon reflection!

The fact is, the real world is not decided by moral philosophers. It’s decided by power politics, economics, and Darwinian selection. Moral philosophers can certainly have an impact through these channels, but they’re unlikely to convince the world to rally behind CEV. Can you imagine the US military—during its AGI development process—deciding to adopt CEV? No way. It would adopt something that ensures the continued military and political dominance of the US, driven by mainstream American values. Same goes for China or any other country. If AGI is developed by a corporation, the values will reflect those of the corporation or the small group of developers and supervisors who hold the most power over the project. Unless that group is extremely enlightened, CEV is not what we’ll get.

Anyway, this is assuming that the developers of AGI can even keep it under control. Most likely AGI will turn into a paperclipper or else evolve into some other kind of Darwinian force over which we lose control.

Objection 1: “Okay. Future military or corporate developers of AGI probably won’t do CEV. But why do you think they’d care about wild-animal suffering, etc. either?”

Well, they might not, but if we make the wild-animal movement successful, then in ~50-100 years when AGI does come along, the notion of not spreading wild-animal suffering might be sufficiently mainstream that even military or corporate executives would care about it, at least to some degree.

If post-humanity does achieve astronomical power, it will only be through AGI, so there’s high value for influencing the future developers of an AGI. For this reason I believe we should focus our meme-spreading on those targets. However, this doesn’t mean they should be our only focus, for two reasons: (1) Future AGI developers will themselves be influenced by their friends, popular media, contemporary philosophical and cultural norms, etc., so if we can change those things, we will diffusely impact future AGI developers too. (2) We need to build our movement, and the lowest-hanging fruit for new supporters are those most interested in the cause (e.g., antispeciesists, environmental-ethics students, transhumanists). We should reach out to them to expand our base of support before going after the big targets.

Objection 2: “Fine. But just as we can advance values like preventing the spread of wild-animal suffering, couldn’t we also increase the likelihood of CEV by promoting that idea?”

Sure, we could. The problem is, CEV is not an optimal thing to promote, IMHO. It’s sufficiently general that lots of people would want it, so for ourselves, the higher leverage comes from advancing our particular, more idiosyncratic values. Promoting CEV is kind of like promoting democracy or free speech: It’s fine to do, but if you have a particular cause that you think is more important than other people realize, it’s probably going to be better to promote that specific cause than to jump on the bandwagon and do the same thing everyone else is doing, since the bandwagon’s cause may not be what you yourself prefer.

Indeed, for myself, it’s possible CEV could be a net bad thing, if it would reduce the likelihood of paperclipping—a future which might (or might not) contain far less suffering than a future directed by humanity’s extrapolated values.

Reason 2: CEV would lead to values we don’t like

Some believe that morality is absolute, in which case a CEV’s job would be to uncover what that is. This view is mistaken, for the following reasons: (1) Existence of a separate realm of reality where ethical truths reside violates Occam’s razor, and (2) even if they did exist, why would we care what they were?

Yudkowsky and the LessWrong community agree that ethics is not absolute, so they have different motivations behind CEV. As far as I can gather, the following are two of them:

Motivation 1: Some believe CEV is genuinely the right thing to do

As Eliezer said in his 2004 paper (p. 29), “Implementing CEV is just my attempt not to be a jerk.” Some may believe that CEV is the ideal meta-ethical way to resolve ethical disputes.

I have to differ. First, the set of minds included in CEV is totally arbitrary, and hence, so will be the output. Why include only humans? Why not animals? Why not dead humans? Why not humans that weren’t born but might have been? Why not paperclip maximizers? Baby eaters? Pebble sorters? Suffering maximizers? Wherever you draw the line, there you’re already inserting your values into the process.

And then once you’ve picked the set of minds to extrapolate, you still have astronomically many ways to do the extrapolation, each of which could give wildly different outputs. Humans have a thousand random shards of intuition about values that resulted from all kinds of little, arbitrary perturbations during evolution and environmental exposure. If the CEV algorithm happens to make some more salient than others, this will potentially change the outcome, perhaps drastically (butterfly effects).

Now, I would be in favor of a reasonable extrapolation of my own values. But humanity’s values are not my values. There are people who want to spread life throughout the universe regardless of suffering, people who want to preserve nature free from human interference, people who want to create lab universes because it would be cool, people who oppose utilitronium and support retaining suffering in the world, people who want to send members of other religions to eternal torture, people who believe sinful children should burn forever in red-hot ovens, and on and on. I do not want these values to be part of the mix.

Maybe (hopefully) some of these beliefs would go away once people learned more about what these wishes really implied, but some would not. Take abortion, for example: Some non-religious people genuinely oppose it, and not for trivial, misinformed reasons. They have thought long and hard about abortion and still find it to be wrong. Others have thought long and hard and still find it to be not wrong. At some point, we have to admit that human intuitions are genuinely in conflict in an irreconcilable way. Some human intuitions are irreconcilably opposed to mine, and I don’t want them in the extrapolation process.

Motivation 2: Some argue that even if CEV isn’t ideal, it’s the best game-theoretic approach because it amounts to cooperating on the prisoner’s dilemma

I think the idea is that if you try to promote your specific values above everyone else’s, then you’re timelessly causing this to be the decision of other groups of people who want to push for their values instead. But if you decided to cooperate with everyone, you would timelessly influence others to do the same.

This seems worth considering, but I’m doubtful that the argument is compelling enough to take too seriously. I can almost guarantee that if I decided to start cooperating by working toward CEV, everyone else working to shape values of the future wouldn’t suddenly jump on board and do the same.

Objection 1: “Suppose CEV did happen. Then spreading concern for wild animals and the like might have little value, because the CEV process would realize that you had tried to rig the system ahead of time by making more people care about the cause, and it would attempt to neutralize your efforts.”

Well, first of all, CEV is (almost certainly) never going to happen, so I’m not too worried. Second of all, it’s not clear to me that such a scheme would actually be put in place. If you’re trying to undo pre-CEV influences that led to the distribution of opinions to that point, you’re going to have a heck of a lot of undoing to do. Are you going to undo the abundance of Catholics because their religion discouraged birth control and so led to large numbers of supporters? Are you going to undo the over-representation of healthy humans because natural selection unfairly removed all those sickly ones? Are you going to undo the under-representation of dinosaurs because an arbitrary asteroid killed them off before CEV came around?

The fact is that who has power at the time of AGI will probably matter a lot. If we can improve the values of those who will have power in the future, this will in expectation lead to better outcomes—regardless of whether the CEV fairy tale comes true.