Co­her­ent Ex­trap­o­lated Volition

TagLast edit: 13 Dec 2023 16:11 UTC by

Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.

In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

Often CEV is used generally to refer to what the idealized version of a person would want, separate from the context of building aligned AI’s.

What is volition?

As an example of the classical concept of volition, the author develops a simple thought experiment: imagine you’re facing two boxes, A and B. One of these boxes, and only one, has a diamond in it – box B. You are now asked to make a guess, whether to choose box A or B, and you chose to open box A. It was your decision to take box A, but your volition was to choose box B, since you wanted the diamond in the first place.

Now imagine someone else – Fred – is faced with the same task and you want to help him in his decision by giving the box he chose, box A. Since you know where the diamond is, simply handing him the box isn’t helping. As such, you mentally extrapolate a volition for Fred, based on a version of him that knows where the diamond is, and imagine he actually wants box B.

Coherent Extrapolated Volition

“The “Coherent” in “Coherent Extrapolated Volition” does not indicate the idea that an extrapolated volition is necessarily coherent. The “Coherent” part indicates the idea that if you build an FAI and run it on an extrapolated human, the FAI should only act on the coherent parts. Where there are multiple attractors, the FAI should hold satisficing avenues open, not try to decide itself.”—Eliezer Yudkowsky

In developing friendly AI, one acting for our best interests, we would have to take care that it would have implemented, from the beginning, a coherent extrapolated volition of humankind. In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

The main problems with CEV include, firstly, the great difficulty of implementing such a program—“If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there’s a “principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live” and his essay was essentially conflating the two definitions.

Mir­rors and Paintings

23 Aug 2008 0:29 UTC
28 points

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

1 Feb 2024 21:15 UTC
3 points

Re­quire­ments for a Basin of At­trac­tion to Alignment

14 Feb 2024 7:10 UTC
20 points

The self-un­al­ign­ment problem

14 Apr 2023 12:10 UTC
144 points

Is it time to start think­ing about what AI Friendli­ness means?

11 Apr 2022 9:32 UTC
18 points

[Question] Is there any se­ri­ous at­tempt to cre­ate a sys­tem to figure out the CEV of hu­man­ity and if not, why haven’t we started yet?

25 Feb 2021 22:06 UTC
4 points

Solv­ing For Meta-Ethics By In­duc­ing From The Self

20 Jan 2023 7:21 UTC
4 points

[NSFW Re­view] In­ter­species Reviewers

1 Apr 2022 11:09 UTC
52 points

CEV: co­her­ence ver­sus extrapolation

22 Sep 2014 11:24 UTC
21 points

Stanovich on CEV

29 Apr 2012 9:37 UTC
19 points

Con­cept ex­trap­o­la­tion: key posts

19 Apr 2022 10:01 UTC
13 points

CEV-in­spired models

7 Dec 2011 18:35 UTC
10 points

CEV: a util­i­tar­ian critique

26 Jan 2013 16:12 UTC
32 points

Hack­ing the CEV for Fun and Profit

3 Jun 2010 20:30 UTC
78 points

CEV-tropes

22 Sep 2014 18:21 UTC
12 points

Hu­man­ity as an en­tity: An al­ter­na­tive to Co­her­ent Ex­trap­o­lated Volition

22 Apr 2022 12:48 UTC
2 points

Con­trary to List of Lethal­ity’s point 22, al­ign­ment’s door num­ber 2

14 Dec 2022 22:01 UTC
−2 points

Co­her­ent ex­trap­o­lated dreaming

26 Dec 2022 17:29 UTC
38 points

[Question] What would the cre­ation of al­igned AGI look like for us?

8 Apr 2022 18:05 UTC
3 points

How Would an Utopia-Max­i­mizer Look Like?

20 Dec 2023 20:01 UTC
31 points

A prob­lem with the most re­cently pub­lished ver­sion of CEV

23 Aug 2023 18:05 UTC
7 points

Prefer­ence Ag­gre­ga­tion as Bayesian Inference

27 Jul 2023 17:59 UTC
14 points

Begin­ning re­sources for CEV research

7 May 2011 5:28 UTC
20 points

Towards an Ethics Calcu­la­tor for Use by an AGI

12 Dec 2023 18:37 UTC
2 points

6 Jun 2012 15:21 UTC
4 points

On What Selves Are—CEV sequence

14 Feb 2012 19:21 UTC
−8 points

In favour of a se­lec­tive CEV ini­tial dynamic

21 Oct 2011 17:33 UTC
16 points

A Difficulty in the Con­cept of CEV

27 Mar 2013 1:20 UTC
6 points

So­cial Choice Ethics in Ar­tifi­cial In­tel­li­gence (pa­per challeng­ing CEV-like ap­proaches to choos­ing an AI’s val­ues)

3 Oct 2017 17:39 UTC
3 points
(papers.ssrn.com)

Why the be­liefs/​val­ues di­chotomy?

20 Oct 2009 16:35 UTC
29 points

In­suffi­cient Values

16 Jun 2021 14:33 UTC
31 points

Mor­pholog­i­cal in­tel­li­gence, su­per­hu­man em­pa­thy, and eth­i­cal arbitration

13 Feb 2023 10:25 UTC
1 point

Topics to dis­cuss CEV

6 Jul 2011 14:19 UTC
8 points

Cog­ni­tive Neu­ro­science, Ar­row’s Im­pos­si­bil­ity The­o­rem, and Co­her­ent Ex­trap­o­lated Volition

25 Sep 2011 11:15 UTC
26 points

After Align­ment — Dialogue be­tween RogerDear­naley and Seth Herd

2 Dec 2023 6:03 UTC
15 points

[Question] Can co­her­ent ex­trap­o­lated vo­li­tion be es­ti­mated with In­verse Re­in­force­ment Learn­ing?

15 Apr 2019 3:23 UTC
12 points

Open-ended ethics of phe­nom­ena (a desider­ata with uni­ver­sal moral­ity)

8 Nov 2023 20:10 UTC
1 point

Scien­tism vs. people

18 Apr 2023 17:28 UTC
4 points

​​ Open-ended/​Phenom­e­nal ​Ethics ​(TLDR)

9 Nov 2023 16:58 UTC
3 points

Op­tion­al­ity ap­proach to ethics

13 Nov 2023 15:23 UTC
7 points

Why small phe­nomenons are rele­vant to moral­ity ​

13 Nov 2023 15:25 UTC
1 point

Tak­ing Into Ac­count Sen­tient Non-Hu­mans in AI Am­bi­tious Value Learn­ing: Sen­tien­tist Co­her­ent Ex­trap­o­lated Volition

2 Dec 2023 14:07 UTC
26 points

In­fer­ence from a Math­e­mat­i­cal De­scrip­tion of an Ex­ist­ing Align­ment Re­search: a pro­posal for an outer al­ign­ment re­search program

2 Jun 2023 21:54 UTC
7 points

Philo­soph­i­cal Cy­borg (Part 1)

14 Jun 2023 16:20 UTC
31 points

Su­per­in­tel­li­gence 23: Co­her­ent ex­trap­o­lated volition

17 Feb 2015 2:00 UTC
15 points

Ob­jec­tions to Co­her­ent Ex­trap­o­lated Volition

22 Nov 2011 10:32 UTC
12 points

Harsanyi’s So­cial Ag­gre­ga­tion The­o­rem and what it means for CEV

5 Jan 2013 21:38 UTC
38 points

Ideal Ad­vi­sor The­o­ries and Per­sonal CEV

25 Dec 2012 13:04 UTC
35 points

Ma­hatma Arm­strong: CEVed to death.

6 Jun 2013 12:50 UTC
33 points

Trou­bles With CEV Part2 - CEV Sequence

28 Feb 2012 4:19 UTC
10 points

Trou­bles With CEV Part1 - CEV Sequence

28 Feb 2012 4:15 UTC
6 points

Crit­i­cisms of CEV (re­quest for links)

16 Nov 2010 4:02 UTC
10 points