RSS

Co­her­ent Ex­trap­o­lated Volition

TagLast edit: 13 Dec 2023 16:11 UTC by Yoav Ravid

Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.

Related: Friendly AI, Metaethics Sequence, Complexity of Value

In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

Often CEV is used generally to refer to what the idealized version of a person would want, separate from the context of building aligned AI’s.

What is volition?

As an example of the classical concept of volition, the author develops a simple thought experiment: imagine you’re facing two boxes, A and B. One of these boxes, and only one, has a diamond in it – box B. You are now asked to make a guess, whether to choose box A or B, and you chose to open box A. It was your decision to take box A, but your volition was to choose box B, since you wanted the diamond in the first place.

Now imagine someone else – Fred – is faced with the same task and you want to help him in his decision by giving the box he chose, box A. Since you know where the diamond is, simply handing him the box isn’t helping. As such, you mentally extrapolate a volition for Fred, based on a version of him that knows where the diamond is, and imagine he actually wants box B.

Coherent Extrapolated Volition

“The “Coherent” in “Coherent Extrapolated Volition” does not indicate the idea that an extrapolated volition is necessarily coherent. The “Coherent” part indicates the idea that if you build an FAI and run it on an extrapolated human, the FAI should only act on the coherent parts. Where there are multiple attractors, the FAI should hold satisficing avenues open, not try to decide itself.”—Eliezer Yudkowsky

In developing friendly AI, one acting for our best interests, we would have to take care that it would have implemented, from the beginning, a coherent extrapolated volition of humankind. In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

The main problems with CEV include, firstly, the great difficulty of implementing such a program—“If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there’s a “principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live” and his essay was essentially conflating the two definitions.

Further Reading & References

See also

Mir­rors and Paintings

Eliezer Yudkowsky23 Aug 2008 0:29 UTC
29 points
42 comments8 min readLW link

The self-un­al­ign­ment problem

14 Apr 2023 12:10 UTC
148 points
24 comments10 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
38 points
11 comments31 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
14 points
15 comments13 min readLW link

Is it time to start think­ing about what AI Friendli­ness means?

Victor Novikov11 Apr 2022 9:32 UTC
18 points
6 comments3 min readLW link

[Question] Is there any se­ri­ous at­tempt to cre­ate a sys­tem to figure out the CEV of hu­man­ity and if not, why haven’t we started yet?

Jonas Hallgren25 Feb 2021 22:06 UTC
5 points
2 comments1 min readLW link

A prob­lem with the most re­cently pub­lished ver­sion of CEV

ThomasCederborg23 Aug 2023 18:05 UTC
10 points
7 comments8 min readLW link

[NSFW Re­view] In­ter­species Reviewers

lsusr1 Apr 2022 11:09 UTC
52 points
8 comments2 min readLW link

CEV: co­her­ence ver­sus extrapolation

Stuart_Armstrong22 Sep 2014 11:24 UTC
21 points
17 comments2 min readLW link

Con­cept ex­trap­o­la­tion: key posts

Stuart_Armstrong19 Apr 2022 10:01 UTC
13 points
2 comments1 min readLW link

Stanovich on CEV

lukeprog29 Apr 2012 9:37 UTC
19 points
6 comments3 min readLW link

CEV-in­spired models

Stuart_Armstrong7 Dec 2011 18:35 UTC
10 points
43 comments1 min readLW link

CEV: a util­i­tar­ian critique

Pablo26 Jan 2013 16:12 UTC
32 points
87 comments5 min readLW link

Hack­ing the CEV for Fun and Profit

Wei Dai3 Jun 2010 20:30 UTC
78 points
207 comments1 min readLW link

CEV-tropes

snarles22 Sep 2014 18:21 UTC
12 points
15 comments1 min readLW link

Solv­ing For Meta-Ethics By In­duc­ing From The Self

VisionaryHera20 Jan 2023 7:21 UTC
4 points
1 comment9 min readLW link

Co­her­ent ex­trap­o­lated dreaming

Alex Flint26 Dec 2022 17:29 UTC
38 points
10 comments17 min readLW link

Prefer­ence Ag­gre­ga­tion as Bayesian Inference

beren27 Jul 2023 17:59 UTC
14 points
1 comment1 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC
31 points
23 comments10 min readLW link

[Question] What would the cre­ation of al­igned AGI look like for us?

Perhaps8 Apr 2022 18:05 UTC
3 points
4 comments1 min readLW link

Hu­man­ity as an en­tity: An al­ter­na­tive to Co­her­ent Ex­trap­o­lated Volition

Victor Novikov22 Apr 2022 12:48 UTC
3 points
2 comments4 min readLW link

Con­trary to List of Lethal­ity’s point 22, al­ign­ment’s door num­ber 2

False Name14 Dec 2022 22:01 UTC
−2 points
5 comments22 min readLW link

Turn­ing Some In­con­sis­tent Prefer­ences into Con­sis­tent Ones

niplav18 Jul 2022 18:40 UTC
23 points
5 comments12 min readLW link

Trou­bles With CEV Part1 - CEV Sequence

diegocaleiro28 Feb 2012 4:15 UTC
6 points
10 comments8 min readLW link

Crit­i­cisms of CEV (re­quest for links)

Kevin16 Nov 2010 4:02 UTC
10 points
29 comments1 min readLW link

Con­cerns Sur­round­ing CEV: A case for hu­man friendli­ness first

ai-crotes22 Jan 2020 21:03 UTC
1 point
19 comments1 min readLW link

Two ques­tions about CEV that worry me

cousin_it23 Dec 2010 15:58 UTC
37 points
141 comments1 min readLW link

Begin­ning re­sources for CEV research

lukeprog7 May 2011 5:28 UTC
20 points
32 comments2 min readLW link

Topics to dis­cuss CEV

diegocaleiro6 Jul 2011 14:19 UTC
8 points
13 comments2 min readLW link

[Link] FreakoS­tats and CEV

Filipe6 Jun 2012 15:21 UTC
4 points
40 comments2 min readLW link

On What Selves Are—CEV sequence

diegocaleiro14 Feb 2012 19:21 UTC
−8 points
17 comments11 min readLW link

Towards an Ethics Calcu­la­tor for Use by an AGI

sweenesm12 Dec 2023 18:37 UTC
3 points
2 comments11 min readLW link

A Difficulty in the Con­cept of CEV

[deleted]27 Mar 2013 1:20 UTC
6 points
23 comments1 min readLW link

So­cial Choice Ethics in Ar­tifi­cial In­tel­li­gence (pa­per challeng­ing CEV-like ap­proaches to choos­ing an AI’s val­ues)

Kaj_Sotala3 Oct 2017 17:39 UTC
3 points
0 comments1 min readLW link
(papers.ssrn.com)

Why the be­liefs/​val­ues di­chotomy?

Wei Dai20 Oct 2009 16:35 UTC
29 points
156 comments2 min readLW link

In­suffi­cient Values

16 Jun 2021 14:33 UTC
31 points
16 comments6 min readLW link

Mor­pholog­i­cal in­tel­li­gence, su­per­hu­man em­pa­thy, and eth­i­cal arbitration

Roman Leventov13 Feb 2023 10:25 UTC
1 point
0 comments2 min readLW link

In favour of a se­lec­tive CEV ini­tial dynamic

[deleted]21 Oct 2011 17:33 UTC
16 points
114 comments11 min readLW link

Cog­ni­tive Neu­ro­science, Ar­row’s Im­pos­si­bil­ity The­o­rem, and Co­her­ent Ex­trap­o­lated Volition

lukeprog25 Sep 2011 11:15 UTC
26 points
18 comments1 min readLW link

Up­date on Devel­op­ing an Ethics Calcu­la­tor to Align an AGI to

sweenesm12 Mar 2024 12:33 UTC
4 points
2 comments8 min readLW link

After Align­ment — Dialogue be­tween RogerDear­naley and Seth Herd

2 Dec 2023 6:03 UTC
15 points
2 comments25 min readLW link

[Question] Can co­her­ent ex­trap­o­lated vo­li­tion be es­ti­mated with In­verse Re­in­force­ment Learn­ing?

Jade Bishop15 Apr 2019 3:23 UTC
12 points
5 comments3 min readLW link

Open-ended ethics of phe­nom­ena (a desider­ata with uni­ver­sal moral­ity)

Ryo 8 Nov 2023 20:10 UTC
1 point
0 comments8 min readLW link

The for­mal goal is a pointer

Morphism1 May 2024 0:27 UTC
20 points
10 comments1 min readLW link

Re­cur­sion in AI is scary. But let’s talk solu­tions.

Oleg Trott16 Jul 2024 20:34 UTC
3 points
10 comments2 min readLW link

[Question] Does VETLM solve AI su­per­al­ign­ment?

Oleg Trott8 Aug 2024 18:22 UTC
−1 points
10 comments1 min readLW link

Scien­tism vs. people

Roman Leventov18 Apr 2023 17:28 UTC
4 points
4 comments11 min readLW link

​​ Open-ended/​Phenom­e­nal ​Ethics ​(TLDR)

Ryo 9 Nov 2023 16:58 UTC
3 points
0 comments1 min readLW link

Op­tion­al­ity ap­proach to ethics

Ryo 13 Nov 2023 15:23 UTC
7 points
2 comments3 min readLW link

Why small phe­nomenons are rele­vant to moral­ity ​

Ryo 13 Nov 2023 15:25 UTC
1 point
0 comments3 min readLW link

Tak­ing Into Ac­count Sen­tient Non-Hu­mans in AI Am­bi­tious Value Learn­ing: Sen­tien­tist Co­her­ent Ex­trap­o­lated Volition

Adrià Moret2 Dec 2023 14:07 UTC
26 points
31 comments42 min readLW link

In­fer­ence from a Math­e­mat­i­cal De­scrip­tion of an Ex­ist­ing Align­ment Re­search: a pro­posal for an outer al­ign­ment re­search program

Christopher King2 Jun 2023 21:54 UTC
7 points
4 comments16 min readLW link

Philo­soph­i­cal Cy­borg (Part 1)

14 Jun 2023 16:20 UTC
31 points
4 comments13 min readLW link

Su­per­in­tel­li­gence 23: Co­her­ent ex­trap­o­lated volition

KatjaGrace17 Feb 2015 2:00 UTC
15 points
98 comments7 min readLW link

Ob­jec­tions to Co­her­ent Ex­trap­o­lated Volition

XiXiDu22 Nov 2011 10:32 UTC
13 points
56 comments3 min readLW link

Harsanyi’s So­cial Ag­gre­ga­tion The­o­rem and what it means for CEV

AlexMennen5 Jan 2013 21:38 UTC
37 points
90 comments4 min readLW link

Ideal Ad­vi­sor The­o­ries and Per­sonal CEV

lukeprog25 Dec 2012 13:04 UTC
35 points
35 comments10 min readLW link

Ma­hatma Arm­strong: CEVed to death.

Stuart_Armstrong6 Jun 2013 12:50 UTC
33 points
62 comments2 min readLW link

Trou­bles With CEV Part2 - CEV Sequence

diegocaleiro28 Feb 2012 4:19 UTC
10 points
7 comments10 min readLW link
No comments.