RSS

Co­her­ent Ex­trap­o­lated Volition

TagLast edit: 16 Sep 2020 22:42 UTC by Ruby

Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.

Related: Friendly AI, Metaethics Sequence, Complexity of Value

In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

Often CEV is used generally to refer to what the idealized version of a person would want, separate from the context of building aligned AI’s.

What is volition?

As an example of the classical concept of volition, the author develops a simple thought experiment: imagine you’re facing two boxes, A and B. One of these boxes, and only one, has a diamond in it – box B. You are now asked to make a guess, whether to chose box A or B, and you chose to open box A. It was your decision to take box A, but your volition was to choose box B, since you wanted the diamond in the first place.

Now imagine someone else – Fred – is faced with the same task and you want to help him in his decision by giving the box he chose, box A. Since you know where the diamond is, simply handling him the box isn’t helping. As such, you mentally extrapolate a volition for Fred, based on a version of him that knows where the diamond is, and imagine he actually wants box B.

Coherent Extrapolated Volition

In developing friendly AI, one acting for our best interests, we would have to take care that it would have implemented, from the beginning, a coherent extrapolated volition of humankind. In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

The main problems with CEV include, firstly, the great difficulty of implementing such a program—“If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there’s a “principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live” and his essay was essentially conflating the two definitions.

Further Reading & References

See also

[Question] Is there any se­ri­ous at­tempt to cre­ate a sys­tem to figure out the CEV of hu­man­ity and if not, why haven’t we started yet?

Jonas Hallgren25 Feb 2021 22:06 UTC
3 points
2 comments1 min readLW link

Su­per­in­tel­li­gence 23: Co­her­ent ex­trap­o­lated volition

KatjaGrace17 Feb 2015 2:00 UTC
11 points
97 comments7 min readLW link

Ob­jec­tions to Co­her­ent Ex­trap­o­lated Volition

XiXiDu22 Nov 2011 10:32 UTC
12 points
56 comments3 min readLW link

CEV: co­her­ence ver­sus extrapolation

Stuart_Armstrong22 Sep 2014 11:24 UTC
21 points
17 comments2 min readLW link

Harsanyi’s So­cial Ag­gre­ga­tion The­o­rem and what it means for CEV

AlexMennen5 Jan 2013 21:38 UTC
37 points
88 comments4 min readLW link

CEV: a util­i­tar­ian critique

Pablo26 Jan 2013 16:12 UTC
32 points
94 comments5 min readLW link

Ideal Ad­vi­sor The­o­ries and Per­sonal CEV

lukeprog25 Dec 2012 13:04 UTC
35 points
35 comments10 min readLW link

CEV-tropes

snarles22 Sep 2014 18:21 UTC
12 points
15 comments1 min readLW link

CEV-in­spired models

Stuart_Armstrong7 Dec 2011 18:35 UTC
10 points
43 comments1 min readLW link

Hack­ing the CEV for Fun and Profit

Wei_Dai3 Jun 2010 20:30 UTC
75 points
207 comments1 min readLW link

Ma­hatma Arm­strong: CEVed to death.

Stuart_Armstrong6 Jun 2013 12:50 UTC
32 points
62 comments2 min readLW link

Stanovich on CEV

lukeprog29 Apr 2012 9:37 UTC
19 points
6 comments3 min readLW link

Trou­bles With CEV Part2 - CEV Sequence

diegocaleiro28 Feb 2012 4:19 UTC
10 points
7 comments10 min readLW link

Trou­bles With CEV Part1 - CEV Sequence

diegocaleiro28 Feb 2012 4:15 UTC
6 points
10 comments8 min readLW link

Crit­i­cisms of CEV (re­quest for links)

Kevin16 Nov 2010 4:02 UTC
10 points
29 comments1 min readLW link

Con­cerns Sur­round­ing CEV: A case for hu­man friendli­ness first

ai-crotes22 Jan 2020 21:03 UTC
1 point
19 comments1 min readLW link

Two ques­tions about CEV that worry me

cousin_it23 Dec 2010 15:58 UTC
37 points
142 comments1 min readLW link

Begin­ning re­sources for CEV research

lukeprog7 May 2011 5:28 UTC
20 points
32 comments2 min readLW link

Topics to dis­cuss CEV

diegocaleiro6 Jul 2011 14:19 UTC
8 points
13 comments2 min readLW link

[Link] FreakoS­tats and CEV

Filipe6 Jun 2012 15:21 UTC
4 points
40 comments2 min readLW link

On What Selves Are—CEV sequence

diegocaleiro14 Feb 2012 19:21 UTC
−8 points
17 comments11 min readLW link

In favour of a se­lec­tive CEV ini­tial dynamic

[deleted]21 Oct 2011 17:33 UTC
16 points
114 comments11 min readLW link

A Difficulty in the Con­cept of CEV

[deleted]27 Mar 2013 1:20 UTC
6 points
23 comments1 min readLW link

So­cial Choice Ethics in Ar­tifi­cial In­tel­li­gence (pa­per challeng­ing CEV-like ap­proaches to choos­ing an AI’s val­ues)

Kaj_Sotala3 Oct 2017 17:39 UTC
3 points
0 comments1 min readLW link
(papers.ssrn.com)

Why the be­liefs/​val­ues di­chotomy?

Wei_Dai20 Oct 2009 16:35 UTC
28 points
156 comments2 min readLW link

Mir­rors and Paintings

Eliezer Yudkowsky23 Aug 2008 0:29 UTC
18 points
42 comments8 min readLW link

MIRIx Part I: In­suffi­cient Values

16 Jun 2021 14:33 UTC
29 points
15 comments6 min readLW link
No comments.