RSS

Hu­man Values

TagLast edit: 16 Sep 2021 14:50 UTC by plex

Human Values are the things we care about, and would want an aligned superintelligence to look after and support. It is suspected that true human values are highly complex, and could be extrapolated into a wide variety of forms.

What AI Safety Re­searchers Have Writ­ten About the Na­ture of Hu­man Values

avturchin16 Jan 2019 13:59 UTC
49 points
3 comments15 min readLW link

Ends: An Introduction

Rob Bensinger11 Mar 2015 19:00 UTC
13 points
0 comments4 min readLW link

Re­view: For­agers, Farm­ers, and Fos­sil Fuels

LRudL2 Sep 2021 17:59 UTC
22 points
7 comments25 min readLW link
(strataoftheworld.blogspot.com)

Utilons vs. Hedons

Psychohistorian10 Aug 2009 19:20 UTC
35 points
119 comments6 min readLW link

Up­com­ing sta­bil­ity of values

Stuart_Armstrong15 Mar 2018 11:36 UTC
15 points
15 comments2 min readLW link

Would I think for ten thou­sand years?

Stuart_Armstrong11 Feb 2019 19:37 UTC
25 points
13 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: self-modelling

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
3 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: al­gorith­mic noise

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
4 comments2 min readLW link

Every­thing I Know About Elite Amer­ica I Learned From ‘Fresh Prince’ and ‘West Wing’

Wei_Dai11 Oct 2020 18:07 UTC
44 points
18 comments1 min readLW link
(www.nytimes.com)

Model­ing hu­mans: what’s the point?

Charlie Steiner10 Nov 2020 1:30 UTC
10 points
1 comment3 min readLW link

Normativity

abramdemski18 Nov 2020 16:52 UTC
46 points
11 comments9 min readLW link

Men­tal sub­agent im­pli­ca­tions for AI Safety

moridinamael3 Jan 2021 18:59 UTC
11 points
0 comments3 min readLW link

Book Re­view: A Pat­tern Lan­guage by Christo­pher Alexander

lincolnquirk15 Oct 2021 1:11 UTC
34 points
7 comments2 min readLW link

Why the Prob­lem of the Cri­te­rion Matters

G Gordon Worley III30 Oct 2021 20:44 UTC
24 points
4 comments8 min readLW link

Value No­tion—Ques­tions to Ask

aysajan17 Jan 2022 15:35 UTC
5 points
0 comments4 min readLW link

Worse than an un­al­igned AGI

shminux10 Apr 2022 3:35 UTC
5 points
12 comments1 min readLW link

A broad basin of at­trac­tion around hu­man val­ues?

Wei_Dai12 Apr 2022 5:15 UTC
104 points
16 comments2 min readLW link

[Question] How path-de­pen­dent are hu­man val­ues?

Ege Erdil15 Apr 2022 9:34 UTC
13 points
13 comments2 min readLW link

[Question] What will hap­pen when an all-reach­ing AGI starts at­tempt­ing to fix hu­man char­ac­ter flaws?

Michael Bright1 Jun 2022 18:45 UTC
1 point
6 comments1 min readLW link

Silliness

lsusr3 Jun 2022 4:59 UTC
18 points
0 comments1 min readLW link

Com­plex Be­hav­ior from Sim­ple (Sub)Agents

moridinamael10 May 2019 21:44 UTC
108 points
13 comments9 min readLW link1 review

Prefer­ence syn­the­sis illus­trated: Star Wars

Stuart_Armstrong9 Jan 2020 16:47 UTC
19 points
8 comments3 min readLW link

In­ner Goodness

Eliezer Yudkowsky23 Oct 2008 22:19 UTC
20 points
31 comments7 min readLW link

In­visi­ble Frameworks

Eliezer Yudkowsky22 Aug 2008 3:36 UTC
19 points
47 comments6 min readLW link

Ter­mi­nal Bias

[deleted]30 Jan 2012 21:03 UTC
23 points
125 comments6 min readLW link

In Praise of Max­i­miz­ing – With Some Caveats

David Althaus15 Mar 2015 19:40 UTC
31 points
19 comments10 min readLW link

Not for the Sake of Selfish­ness Alone

lukeprog2 Jul 2011 17:37 UTC
34 points
20 comments8 min readLW link

What’s wrong with sim­plic­ity of value?

Wei_Dai27 Jul 2011 3:09 UTC
29 points
40 comments1 min readLW link

[Question] Is there any se­ri­ous at­tempt to cre­ate a sys­tem to figure out the CEV of hu­man­ity and if not, why haven’t we started yet?

Jonas Hallgren25 Feb 2021 22:06 UTC
3 points
2 comments1 min readLW link

Quick thoughts on em­pathic metaethics

lukeprog12 Dec 2017 21:46 UTC
26 points
0 comments9 min readLW link

Notes on Judg­ment and Righ­teous Anger

David_Gross30 Jan 2021 19:31 UTC
12 points
1 comment6 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron Berg3 Oct 2021 20:10 UTC
19 points
1 comment16 min readLW link

Thought ex­per­i­ment: coarse-grained VR utopia

cousin_it14 Jun 2017 8:03 UTC
27 points
48 comments1 min readLW link

Hu­man val­ues differ as much as val­ues can differ

PhilGoetz3 May 2010 19:35 UTC
27 points
220 comments7 min readLW link

Selfish­ness, prefer­ence falsifi­ca­tion, and AI alignment

jessicata28 Oct 2021 0:16 UTC
50 points
29 comments13 min readLW link
(unstableontology.com)

Value is Fragile

Eliezer Yudkowsky29 Jan 2009 8:46 UTC
129 points
110 comments6 min readLW link

The Gift We Give To Tomorrow

Eliezer Yudkowsky17 Jul 2008 6:07 UTC
86 points
99 comments8 min readLW link

Con­verg­ing to­ward a Million Worlds

Joe Kwon24 Dec 2021 21:33 UTC
8 points
1 comment3 min readLW link

Ques­tion 2: Pre­dicted bad out­comes of AGI learn­ing architecture

Cameron Berg11 Feb 2022 22:23 UTC
5 points
1 comment10 min readLW link

Ques­tion 4: Im­ple­ment­ing the con­trol proposals

Cameron Berg13 Feb 2022 17:12 UTC
6 points
2 comments5 min readLW link

Why No *In­ter­est­ing* Unal­igned Sin­gu­lar­ity?

David Udell20 Apr 2022 0:34 UTC
11 points
13 comments1 min readLW link

The Unified The­ory of Nor­ma­tive Ethics

Thane Ruthenis17 Jun 2022 19:55 UTC
8 points
0 comments6 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

22 Jun 2022 15:05 UTC
28 points
1 comment14 min readLW link
No comments.