RSS

Hu­man Values

TagLast edit: 16 Sep 2021 14:50 UTC by plex

Human Values are the things we care about, and would want an aligned superintelligence to look after and support. It is suspected that true human values are highly complex, and could be extrapolated into a wide variety of forms.

The shard the­ory of hu­man values

4 Sep 2022 4:28 UTC
225 points
59 comments24 min readLW link

Hu­man val­ues & bi­ases are in­ac­cessible to the genome

TurnTrout7 Jul 2022 17:29 UTC
93 points
51 comments6 min readLW link

What AI Safety Re­searchers Have Writ­ten About the Na­ture of Hu­man Values

avturchin16 Jan 2019 13:59 UTC
50 points
3 comments15 min readLW link

Ends: An Introduction

Rob Bensinger11 Mar 2015 19:00 UTC
12 points
0 comments4 min readLW link

Re­view: For­agers, Farm­ers, and Fos­sil Fuels

LRudL2 Sep 2021 17:59 UTC
23 points
7 comments25 min readLW link
(strataoftheworld.blogspot.com)

Shard The­ory: An Overview

David Udell11 Aug 2022 5:44 UTC
141 points
34 comments10 min readLW link

How evolu­tion suc­ceeds and fails at value alignment

Ocracoke21 Aug 2022 7:14 UTC
21 points
2 comments4 min readLW link

Brain-over-body bi­ases, and the em­bod­ied value prob­lem in AI alignment

geoffreymiller24 Sep 2022 22:24 UTC
10 points
6 comments25 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John Nay26 Oct 2022 1:24 UTC
1 point
10 comments3 min readLW link

Utilons vs. Hedons

Psychohistorian10 Aug 2009 19:20 UTC
35 points
119 comments6 min readLW link

Up­com­ing sta­bil­ity of values

Stuart_Armstrong15 Mar 2018 11:36 UTC
15 points
15 comments2 min readLW link

Would I think for ten thou­sand years?

Stuart_Armstrong11 Feb 2019 19:37 UTC
25 points
13 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: self-modelling

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
3 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: al­gorith­mic noise

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
4 comments2 min readLW link

Every­thing I Know About Elite Amer­ica I Learned From ‘Fresh Prince’ and ‘West Wing’

Wei_Dai11 Oct 2020 18:07 UTC
44 points
18 comments1 min readLW link
(www.nytimes.com)

Model­ing hu­mans: what’s the point?

Charlie Steiner10 Nov 2020 1:30 UTC
10 points
1 comment3 min readLW link

Normativity

abramdemski18 Nov 2020 16:52 UTC
46 points
11 comments9 min readLW link

Men­tal sub­agent im­pli­ca­tions for AI Safety

moridinamael3 Jan 2021 18:59 UTC
11 points
0 comments3 min readLW link

Book Re­view: A Pat­tern Lan­guage by Christo­pher Alexander

lincolnquirk15 Oct 2021 1:11 UTC
52 points
8 comments2 min readLW link1 review

Why the Prob­lem of the Cri­te­rion Matters

Gordon Seidoh Worley30 Oct 2021 20:44 UTC
24 points
9 comments8 min readLW link

Value No­tion—Ques­tions to Ask

aysajan17 Jan 2022 15:35 UTC
5 points
0 comments4 min readLW link

Worse than an un­al­igned AGI

shminux10 Apr 2022 3:35 UTC
−1 points
12 comments1 min readLW link

A broad basin of at­trac­tion around hu­man val­ues?

Wei_Dai12 Apr 2022 5:15 UTC
106 points
17 comments2 min readLW link

[Question] How path-de­pen­dent are hu­man val­ues?

Ege Erdil15 Apr 2022 9:34 UTC
13 points
13 comments2 min readLW link

[Question] What will hap­pen when an all-reach­ing AGI starts at­tempt­ing to fix hu­man char­ac­ter flaws?

Michael Bright1 Jun 2022 18:45 UTC
1 point
6 comments1 min readLW link

Silliness

lsusr3 Jun 2022 4:59 UTC
18 points
0 comments1 min readLW link

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

14 Jul 2022 2:31 UTC
178 points
93 comments9 min readLW link

Notes on Temperance

David Gross9 Nov 2020 2:33 UTC
14 points
2 comments7 min readLW link

Un­der­stand­ing and avoid­ing value drift

TurnTrout9 Sep 2022 4:16 UTC
41 points
9 comments6 min readLW link

AI al­ign­ment with hu­mans… but with which hu­mans?

geoffreymiller9 Sep 2022 18:21 UTC
12 points
33 comments3 min readLW link

The het­ero­gene­ity of hu­man value types: Im­pli­ca­tions for AI alignment

geoffreymiller23 Sep 2022 17:03 UTC
10 points
2 comments10 min readLW link

[Question] Does the ex­is­tence of shared hu­man val­ues im­ply al­ign­ment is “easy”?

Morpheus26 Sep 2022 18:01 UTC
7 points
14 comments1 min readLW link

Data for IRL: What is needed to learn hu­man val­ues?

Jan Wehner3 Oct 2022 9:23 UTC
18 points
6 comments12 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John Nay21 Oct 2022 2:03 UTC
3 points
18 comments54 min readLW link

love, not competition

carado30 Oct 2022 19:44 UTC
30 points
20 comments1 min readLW link
(carado.moe)

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTrout29 Nov 2022 6:23 UTC
57 points
41 comments15 min readLW link

What Does It Mean to Align AI With Hu­man Values?

Algon13 Dec 2022 16:56 UTC
8 points
3 comments1 min readLW link
(www.quantamagazine.org)

Or­di­nary hu­man life

David Hugh-Jones17 Dec 2022 16:46 UTC
24 points
1 comment14 min readLW link
(wyclif.substack.com)

Pos­i­tive val­ues seem more ro­bust and last­ing than prohibitions

TurnTrout17 Dec 2022 21:43 UTC
46 points
12 comments2 min readLW link

My Model Of EA Burnout

LoganStrohl25 Jan 2023 17:52 UTC
224 points
48 comments5 min readLW link

Com­plex Be­hav­ior from Sim­ple (Sub)Agents

moridinamael10 May 2019 21:44 UTC
110 points
13 comments9 min readLW link1 review

Prefer­ence syn­the­sis illus­trated: Star Wars

Stuart_Armstrong9 Jan 2020 16:47 UTC
19 points
8 comments3 min readLW link

In­ner Goodness

Eliezer Yudkowsky23 Oct 2008 22:19 UTC
27 points
31 comments7 min readLW link

In­visi­ble Frameworks

Eliezer Yudkowsky22 Aug 2008 3:36 UTC
22 points
47 comments6 min readLW link

Ter­mi­nal Bias

[deleted]30 Jan 2012 21:03 UTC
24 points
125 comments6 min readLW link

In Praise of Max­i­miz­ing – With Some Caveats

David Althaus15 Mar 2015 19:40 UTC
31 points
19 comments10 min readLW link

Not for the Sake of Selfish­ness Alone

lukeprog2 Jul 2011 17:37 UTC
34 points
20 comments8 min readLW link

What’s wrong with sim­plic­ity of value?

Wei_Dai27 Jul 2011 3:09 UTC
29 points
40 comments1 min readLW link

[Question] Is there any se­ri­ous at­tempt to cre­ate a sys­tem to figure out the CEV of hu­man­ity and if not, why haven’t we started yet?

Jonas Hallgren25 Feb 2021 22:06 UTC
4 points
2 comments1 min readLW link

Quick thoughts on em­pathic metaethics

lukeprog12 Dec 2017 21:46 UTC
27 points
0 comments9 min readLW link

Notes on Judg­ment and Righ­teous Anger

David Gross30 Jan 2021 19:31 UTC
12 points
1 comment6 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron Berg3 Oct 2021 20:10 UTC
19 points
1 comment16 min readLW link

Thought ex­per­i­ment: coarse-grained VR utopia

cousin_it14 Jun 2017 8:03 UTC
27 points
48 comments1 min readLW link

Hu­man val­ues differ as much as val­ues can differ

PhilGoetz3 May 2010 19:35 UTC
27 points
220 comments7 min readLW link

Selfish­ness, prefer­ence falsifi­ca­tion, and AI alignment

jessicata28 Oct 2021 0:16 UTC
52 points
28 comments13 min readLW link
(unstableontology.com)

Value is Fragile

Eliezer Yudkowsky29 Jan 2009 8:46 UTC
146 points
111 comments6 min readLW link

The Gift We Give To Tomorrow

Eliezer Yudkowsky17 Jul 2008 6:07 UTC
103 points
99 comments8 min readLW link

Con­verg­ing to­ward a Million Worlds

Joe Kwon24 Dec 2021 21:33 UTC
9 points
1 comment3 min readLW link

Ques­tion 2: Pre­dicted bad out­comes of AGI learn­ing architecture

Cameron Berg11 Feb 2022 22:23 UTC
5 points
1 comment10 min readLW link

Ques­tion 4: Im­ple­ment­ing the con­trol proposals

Cameron Berg13 Feb 2022 17:12 UTC
6 points
2 comments5 min readLW link

Why No *In­ter­est­ing* Unal­igned Sin­gu­lar­ity?

David Udell20 Apr 2022 0:34 UTC
12 points
14 comments1 min readLW link

The Unified The­ory of Nor­ma­tive Ethics

Thane Ruthenis17 Jun 2022 19:55 UTC
8 points
0 comments6 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

22 Jun 2022 15:05 UTC
30 points
1 comment14 min readLW link

Re­search Notes: What are we al­ign­ing for?

Shoshannah Tekofsky8 Jul 2022 22:13 UTC
19 points
8 comments2 min readLW link

Where Utopias Go Wrong, or: The Four Lit­tle Planets

ExCeph27 May 2022 1:24 UTC
15 points
0 comments11 min readLW link
(ginnungagapfoundation.wordpress.com)

Don’t be a Maxi

Cole Killian31 Jul 2022 23:59 UTC
15 points
7 comments2 min readLW link
(colekillian.com)

Con­tent gen­er­a­tion. Where do we draw the line?

Q Home9 Aug 2022 10:51 UTC
6 points
7 comments2 min readLW link

Broad Pic­ture of Hu­man Values

Thane Ruthenis20 Aug 2022 19:42 UTC
36 points
5 comments10 min readLW link

Align­ment via proso­cial brain algorithms

Cameron Berg12 Sep 2022 13:48 UTC
42 points
28 comments6 min readLW link

Should AI learn hu­man val­ues, hu­man norms or some­thing else?

Q Home17 Sep 2022 6:19 UTC
5 points
1 comment4 min readLW link

Ques­tions about Value Lock-in, Pa­ter­nal­ism, and Empowerment

Sam16 Nov 2022 15:33 UTC
12 points
2 comments12 min readLW link
(sambrown.eu)

[Heb­bian Nat­u­ral Ab­strac­tions] Introduction

21 Nov 2022 20:34 UTC
34 points
3 comments4 min readLW link
(www.snellessen.com)

The Op­por­tu­nity and Risks of Learn­ing Hu­man Values In-Context

Zachary Robertson10 Dec 2022 21:40 UTC
2 points
4 comments5 min readLW link

[Question] [DISC] Are Values Ro­bust?

DragonGod21 Dec 2022 1:00 UTC
12 points
8 comments2 min readLW link

Con­tra Steiner on Too Many Nat­u­ral Abstractions

DragonGod24 Dec 2022 17:42 UTC
10 points
6 comments1 min readLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Math­e­mat­i­cal Foundations

25 Dec 2022 20:58 UTC
15 points
2 comments6 min readLW link
(www.snellessen.com)

Our kind as op­tion­al­ity-op­ti­miz­ing or­a­cles

Logan Kieller9 Jan 2023 16:34 UTC
4 points
2 comments4 min readLW link

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James Blaha20 Feb 2023 0:55 UTC
10 points
2 comments18 min readLW link

A foun­da­tion model ap­proach to value inference

sen21 Feb 2023 5:09 UTC
6 points
0 comments3 min readLW link

Just How Hard a Prob­lem is Align­ment?

Roger Dearnaley25 Feb 2023 9:00 UTC
−1 points
1 comment21 min readLW link

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

Rohin Shah19 Oct 2019 0:30 UTC
60 points
12 comments15 min readLW link
(mailchi.mp)

AGI will know: Hu­mans are not Rational

HumaneAutomation20 Mar 2023 18:46 UTC
0 points
10 comments2 min readLW link
No comments.