Human Values are the things we care about, and would want an aligned superintelligence to look after and support. It is suspected that true human values are highly complex, and could be extrapolated into a wide variety of forms.

The shard the­ory of hu­man values

4 Sep 2022 4:28 UTC
241 points
66 comments24 min readLW link2 reviews

Multi-agent pre­dic­tive minds and AI alignment

Jan_Kulveit12 Dec 2018 23:48 UTC
63 points
18 comments10 min readLW link

Hu­man val­ues & bi­ases are in­ac­cessible to the genome

TurnTrout7 Jul 2022 17:29 UTC
93 points
54 comments6 min readLW link1 review

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
38 points
6 comments31 min readLW link

What AI Safety Re­searchers Have Writ­ten About the Na­ture of Hu­man Values

avturchin16 Jan 2019 13:59 UTC
52 points
3 comments15 min readLW link

5. Mo­ral Value for Sen­tient An­i­mals? Alas, Not Yet

RogerDearnaley27 Dec 2023 6:42 UTC
33 points
41 comments23 min readLW link

6. The Mutable Values Prob­lem in Value Learn­ing and CEV

RogerDearnaley4 Dec 2023 18:31 UTC
12 points
0 comments49 min readLW link

3. Uploading

RogerDearnaley23 Nov 2023 7:39 UTC
21 points
5 comments8 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
13 points
15 comments13 min readLW link

Ends: An Introduction

Rob Bensinger11 Mar 2015 19:00 UTC
16 points
0 comments4 min readLW link

[Valence se­ries] 2. Valence & Normativity

Steven Byrnes7 Dec 2023 16:43 UTC
72 points
5 comments28 min readLW link

4. A Mo­ral Case for Evolved-Sapi­ence-Chau­vinism

RogerDearnaley24 Nov 2023 4:56 UTC
10 points
0 comments4 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC
31 points
23 comments10 min readLW link

Shard The­ory: An Overview

David Udell11 Aug 2022 5:44 UTC
162 points
34 comments10 min readLW link

Brain-over-body bi­ases, and the em­bod­ied value prob­lem in AI alignment

geoffreymiller24 Sep 2022 22:24 UTC
10 points
6 comments25 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John Nay26 Oct 2022 1:24 UTC
1 point
10 comments3 min readLW link

How evolu­tion suc­ceeds and fails at value alignment

Ocracoke21 Aug 2022 7:14 UTC
21 points
2 comments4 min readLW link

Re­view: For­agers, Farm­ers, and Fos­sil Fuels

L Rudolf L2 Sep 2021 17:59 UTC
26 points
7 comments25 min readLW link

Notes on Temperance

David Gross9 Nov 2020 2:33 UTC
14 points
2 comments7 min readLW link

A short di­alogue on com­pa­ra­bil­ity of values

cousin_it20 Dec 2023 14:08 UTC
27 points
7 comments1 min readLW link

Which val­ues are sta­ble un­der on­tol­ogy shifts?

Richard_Ngo23 Jul 2022 2:40 UTC
73 points
48 comments3 min readLW link

Trad­ing off Lives

jefftk3 Jan 2024 3:40 UTC
53 points
12 comments2 min readLW link

Shut Up and Divide?

Wei Dai9 Feb 2010 20:09 UTC
114 points
276 comments1 min readLW link

On­tolog­i­cal Cri­sis in Humans

Wei Dai18 Dec 2012 17:32 UTC
82 points
69 comments4 min readLW link

A “Bit­ter Les­son” Ap­proach to Align­ing AGI and ASI

RogerDearnaley6 Jul 2024 1:23 UTC
52 points
38 comments24 min readLW link

Men­tal sub­agent im­pli­ca­tions for AI Safety

moridinamael3 Jan 2021 18:59 UTC
11 points
0 comments3 min readLW link

De­scrip­tive vs. speci­fi­able values

TsviBT26 Mar 2023 9:10 UTC
17 points
2 comments2 min readLW link

The Com­pu­ta­tional Anatomy of Hu­man Values

beren6 Apr 2023 10:33 UTC
70 points
30 comments30 min readLW link

[In­ter­view w/​ Quintin Pope] Evolu­tion, val­ues, and AI Safety

fowlertm24 Oct 2023 13:53 UTC
11 points
0 comments1 min readLW link

It’s OK to be bi­ased to­wards humans

dr_s11 Nov 2023 11:59 UTC
55 points
69 comments6 min readLW link

Utilons vs. Hedons

Psychohistorian10 Aug 2009 19:20 UTC
39 points
119 comments6 min readLW link

Up­com­ing sta­bil­ity of values

Stuart_Armstrong15 Mar 2018 11:36 UTC
15 points
15 comments2 min readLW link

Would I think for ten thou­sand years?

Stuart_Armstrong11 Feb 2019 19:37 UTC
25 points
13 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: self-modelling

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
3 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: al­gorith­mic noise

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
4 comments2 min readLW link

AGI x An­i­mal Welfare: A High-EV Outreach Op­por­tu­nity?

simeon_c28 Jun 2023 20:44 UTC
29 points
0 comments1 min readLW link

Valuism—an ap­proach to life for you to consider

spencerg19 Jul 2023 15:23 UTC
17 points
2 comments1 min readLW link

Every­thing I Know About Elite Amer­ica I Learned From ‘Fresh Prince’ and ‘West Wing’

Wei Dai11 Oct 2020 18:07 UTC
44 points
18 comments1 min readLW link

Model­ing hu­mans: what’s the point?

Charlie Steiner10 Nov 2020 1:30 UTC
10 points
1 comment3 min readLW link

Book re­view: The Im­por­tance of What We Care About (Harry G. Frank­furt)

David Gross13 Sep 2023 4:17 UTC
7 points
0 comments4 min readLW link


abramdemski18 Nov 2020 16:52 UTC
47 points
11 comments9 min readLW link

Book Re­view: A Pat­tern Lan­guage by Christo­pher Alexander

lincolnquirk15 Oct 2021 1:11 UTC
57 points
8 comments2 min readLW link1 review

Why the Prob­lem of the Cri­te­rion Matters

Gordon Seidoh Worley30 Oct 2021 20:44 UTC
24 points
9 comments8 min readLW link

Value No­tion—Ques­tions to Ask

aysajan17 Jan 2022 15:35 UTC
5 points
0 comments4 min readLW link

Worse than an un­al­igned AGI

shminux10 Apr 2022 3:35 UTC
−1 points
11 comments1 min readLW link

A broad basin of at­trac­tion around hu­man val­ues?

Wei Dai12 Apr 2022 5:15 UTC
113 points
17 comments2 min readLW link

[Question] How path-de­pen­dent are hu­man val­ues?

Ege Erdil15 Apr 2022 9:34 UTC
13 points
13 comments2 min readLW link

[Question] What will hap­pen when an all-reach­ing AGI starts at­tempt­ing to fix hu­man char­ac­ter flaws?

Michael Bright1 Jun 2022 18:45 UTC
1 point
6 comments1 min readLW link


lsusr3 Jun 2022 4:59 UTC
18 points
0 comments1 min readLW link

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

14 Jul 2022 2:31 UTC
203 points
94 comments9 min readLW link1 review

Un­der­stand­ing and avoid­ing value drift

TurnTrout9 Sep 2022 4:16 UTC
48 points
9 comments6 min readLW link

AI al­ign­ment with hu­mans… but with which hu­mans?

geoffreymiller9 Sep 2022 18:21 UTC
12 points
33 comments3 min readLW link

The het­ero­gene­ity of hu­man value types: Im­pli­ca­tions for AI alignment

geoffreymiller23 Sep 2022 17:03 UTC
10 points
2 comments10 min readLW link

[Question] Does the ex­is­tence of shared hu­man val­ues im­ply al­ign­ment is “easy”?

Morpheus26 Sep 2022 18:01 UTC
7 points
15 comments1 min readLW link

Data for IRL: What is needed to learn hu­man val­ues?

Jan Wehner3 Oct 2022 9:23 UTC
18 points
6 comments12 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John Nay21 Oct 2022 2:03 UTC
5 points
18 comments54 min readLW link

love, not competition

Tamsin Leake30 Oct 2022 19:44 UTC
30 points
20 comments1 min readLW link

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTrout29 Nov 2022 6:23 UTC
60 points
42 comments15 min readLW link

What Does It Mean to Align AI With Hu­man Values?

Algon13 Dec 2022 16:56 UTC
8 points
3 comments1 min readLW link

Or­di­nary hu­man life

David Hugh-Jones17 Dec 2022 16:46 UTC
24 points
2 comments14 min readLW link

Pos­i­tive val­ues seem more ro­bust and last­ing than prohibitions

TurnTrout17 Dec 2022 21:43 UTC
51 points
13 comments2 min readLW link

My Model Of EA Burnout

LoganStrohl25 Jan 2023 17:52 UTC
239 points
49 comments5 min readLW link

Notes on Judg­ment and Righ­teous Anger

David Gross30 Jan 2021 19:31 UTC
13 points
1 comment6 min readLW link

Agent mem­branes/​bound­aries and for­mal­iz­ing “safety”

Chipmonk3 Jan 2024 17:55 UTC
23 points
46 comments3 min readLW link

[Question] [DISC] Are Values Ro­bust?

DragonGod21 Dec 2022 1:00 UTC
12 points
9 comments2 min readLW link

Con­tra Steiner on Too Many Nat­u­ral Abstractions

DragonGod24 Dec 2022 17:42 UTC
10 points
6 comments1 min readLW link

Align­ment via proso­cial brain algorithms

Cameron Berg12 Sep 2022 13:48 UTC
44 points
28 comments6 min readLW link

Should AI learn hu­man val­ues, hu­man norms or some­thing else?

Q Home17 Sep 2022 6:19 UTC
5 points
1 comment4 min readLW link

Is the En­dow­ment Effect Due to In­com­pa­ra­bil­ity?

Kevin Dorst10 Jul 2023 16:26 UTC
21 points
10 comments7 min readLW link

The Align­ment Prob­lem No One Is Talk­ing About

James Stephen Brown10 May 2024 18:34 UTC
10 points
10 comments2 min readLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Math­e­mat­i­cal Foundations

25 Dec 2022 20:58 UTC
15 points
2 comments6 min readLW link

Prob­lems with Robin Han­son’s Quillette Ar­ti­cle On AI

DaemonicSigil6 Aug 2023 22:13 UTC
89 points
33 comments8 min readLW link

Safety First: safety be­fore full al­ign­ment. The de­on­tic suffi­ciency hy­poth­e­sis.

Chipmonk3 Jan 2024 17:55 UTC
47 points
3 comments3 min readLW link

Prefer­ence syn­the­sis illus­trated: Star Wars

Stuart_Armstrong9 Jan 2020 16:47 UTC
20 points
8 comments3 min readLW link

Demo­cratic Fine-Tuning

Joe Edelman29 Aug 2023 18:13 UTC
24 points
2 comments1 min readLW link

“Want­ing” and “lik­ing”

Mateusz Bagiński30 Aug 2023 14:52 UTC
22 points
2 comments29 min readLW link

In­ner Goodness

Eliezer Yudkowsky23 Oct 2008 22:19 UTC
27 points
31 comments7 min readLW link

In­visi­ble Frameworks

Eliezer Yudkowsky22 Aug 2008 3:36 UTC
27 points
47 comments6 min readLW link

AGI will know: Hu­mans are not Rational

HumaneAutomation20 Mar 2023 18:46 UTC
0 points
10 comments2 min readLW link

Un­cov­er­ing La­tent Hu­man Wel­lbe­ing in LLM Embeddings

14 Sep 2023 1:40 UTC
32 points
7 comments8 min readLW link

Public Opinion on AI Safety: AIMS 2023 and 2021 Summary

25 Sep 2023 18:55 UTC
3 points
2 comments3 min readLW link

Should Effec­tive Altru­ists be Valuists in­stead of util­i­tar­i­ans?

25 Sep 2023 14:03 UTC
1 point
3 comments6 min readLW link

Ter­mi­nal Bias

[deleted]30 Jan 2012 21:03 UTC
24 points
125 comments6 min readLW link

How to co­or­di­nate de­spite our bi­ases? - tldr

Ryo 18 Apr 2024 15:03 UTC
3 points
2 comments3 min readLW link

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James Blaha20 Feb 2023 0:55 UTC
10 points
2 comments18 min readLW link

In Praise of Max­i­miz­ing – With Some Caveats

David Althaus15 Mar 2015 19:40 UTC
32 points
19 comments10 min readLW link

Not for the Sake of Selfish­ness Alone

lukeprog2 Jul 2011 17:37 UTC
34 points
20 comments8 min readLW link

[Question] Is there any se­ri­ous at­tempt to cre­ate a sys­tem to figure out the CEV of hu­man­ity and if not, why haven’t we started yet?

Jonas Hallgren25 Feb 2021 22:06 UTC
5 points
2 comments1 min readLW link

Quick thoughts on em­pathic metaethics

lukeprog12 Dec 2017 21:46 UTC
29 points
0 comments9 min readLW link

Please Understand

samhealy1 Apr 2024 12:33 UTC
29 points
11 comments6 min readLW link

Im­pos­si­bil­ity of An­thro­pocen­tric-Alignment

False Name24 Feb 2024 18:31 UTC
−8 points
2 comments39 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron Berg3 Oct 2021 20:10 UTC
19 points
1 comment16 min readLW link

An­tag­o­nis­tic AI

Xybermancer1 Mar 2024 18:50 UTC
−8 points
1 comment1 min readLW link

Thought ex­per­i­ment: coarse-grained VR utopia

cousin_it14 Jun 2017 8:03 UTC
27 points
48 comments1 min readLW link

Hu­man val­ues differ as much as val­ues can differ

PhilGoetz3 May 2010 19:35 UTC
27 points
220 comments7 min readLW link

Selfish­ness, prefer­ence falsifi­ca­tion, and AI alignment

jessicata28 Oct 2021 0:16 UTC
52 points
28 comments13 min readLW link

What does davi­dad want from «bound­aries»?

6 Feb 2024 17:45 UTC
41 points
1 comment5 min readLW link

Value is Fragile

Eliezer Yudkowsky29 Jan 2009 8:46 UTC
167 points
108 comments6 min readLW link

The Gift We Give To Tomorrow

Eliezer Yudkowsky17 Jul 2008 6:07 UTC
145 points
99 comments8 min readLW link

A foun­da­tion model ap­proach to value inference

sen21 Feb 2023 5:09 UTC
6 points
0 comments3 min readLW link

Con­verg­ing to­ward a Million Worlds

Joe Kwon24 Dec 2021 21:33 UTC
11 points
1 comment3 min readLW link

Ques­tions about Value Lock-in, Pa­ter­nal­ism, and Empowerment

Sam F. Brown16 Nov 2022 15:33 UTC
13 points
2 comments12 min readLW link

Ques­tion 2: Pre­dicted bad out­comes of AGI learn­ing architecture

Cameron Berg11 Feb 2022 22:23 UTC
5 points
1 comment10 min readLW link

Ques­tion 4: Im­ple­ment­ing the con­trol proposals

Cameron Berg13 Feb 2022 17:12 UTC
6 points
2 comments5 min readLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Introduction

21 Nov 2022 20:34 UTC
34 points
3 comments4 min readLW link

Why No *In­ter­est­ing* Unal­igned Sin­gu­lar­ity?

David Udell20 Apr 2022 0:34 UTC
12 points
12 comments1 min readLW link

Just How Hard a Prob­lem is Align­ment?

Roger Dearnaley25 Feb 2023 9:00 UTC
−1 points
1 comment21 min readLW link

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

Rohin Shah19 Oct 2019 0:30 UTC
60 points
12 comments15 min readLW link

The Unified The­ory of Nor­ma­tive Ethics

Thane Ruthenis17 Jun 2022 19:55 UTC
8 points
0 comments6 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

22 Jun 2022 15:05 UTC
32 points
1 comment14 min readLW link

Re­search Notes: What are we al­ign­ing for?

Shoshannah Tekofsky8 Jul 2022 22:13 UTC
19 points
8 comments2 min readLW link

Shard The­ory—is it true for hu­mans?

Rishika14 Jun 2024 19:21 UTC
66 points
7 comments15 min readLW link

Where Utopias Go Wrong, or: The Four Lit­tle Planets

ExCeph27 May 2022 1:24 UTC
15 points
0 comments11 min readLW link

A (para­con­sis­tent) logic to deal with in­con­sis­tent preferences

B Jacobs14 Jul 2024 11:17 UTC
6 points
2 comments4 min readLW link

Mus­ings of a Lay­man: Tech­nol­ogy, AI, and the Hu­man Condition

Crimson Liquidity15 Jul 2024 18:40 UTC
−2 points
0 comments8 min readLW link

Value learn­ing in the ab­sence of ground truth

Joel_Saarinen5 Feb 2024 18:56 UTC
47 points
8 comments45 min readLW link

Don’t be a Maxi

Cole Killian31 Jul 2022 23:59 UTC
15 points
7 comments2 min readLW link

What’s wrong with sim­plic­ity of value?

Wei Dai27 Jul 2011 3:09 UTC
29 points
40 comments1 min readLW link

How to re­spond to the re­cent con­dem­na­tions of the ra­tio­nal­ist community

Christopher King4 Apr 2023 1:42 UTC
−2 points
7 comments4 min readLW link

Alien Axiology

snerx20 Apr 2023 0:27 UTC
3 points
2 comments5 min readLW link

P(doom|su­per­in­tel­li­gence) or coin tosses and dice throws of hu­man val­ues (and other re­lated Ps).

Muyyd22 Apr 2023 10:06 UTC
−7 points
0 comments4 min readLW link

Hu­man wanting

TsviBT24 Oct 2023 1:05 UTC
51 points
1 comment10 min readLW link

Con­tent gen­er­a­tion. Where do we draw the line?

Q Home9 Aug 2022 10:51 UTC
6 points
7 comments2 min readLW link

[Thought Ex­per­i­ment] To­mor­row’s Echo—The fu­ture of syn­thetic com­pan­ion­ship.

Vimal Naran26 Oct 2023 17:54 UTC
−7 points
2 comments2 min readLW link

[Linkpost] Con­cept Align­ment as a Pr­ereq­ui­site for Value Alignment

Bogdan Ionut Cirstea4 Nov 2023 17:34 UTC
27 points
0 comments1 min readLW link

Broad Pic­ture of Hu­man Values

Thane Ruthenis20 Aug 2022 19:42 UTC
42 points
6 comments10 min readLW link

‘The­o­ries of Values’ and ‘The­o­ries of Agents’: con­fu­sions, mus­ings and desiderata

15 Nov 2023 16:00 UTC
34 points
8 comments24 min readLW link

My cri­tique of Eliezer’s deeply ir­ra­tional beliefs

Jorterder16 Nov 2023 0:34 UTC
−33 points
1 comment9 min readLW link

1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC
14 points
8 comments15 min readLW link

2. AIs as Eco­nomic Agents

RogerDearnaley23 Nov 2023 7:07 UTC
9 points
2 comments6 min readLW link

Pre­serv­ing our her­i­tage: Build­ing a move­ment and a knowl­edge ark for cur­rent and fu­ture generations

rnk829 Nov 2023 19:20 UTC
0 points
5 comments12 min readLW link

[FICTION] ECHOES OF ELYSIUM: An Ai’s Jour­ney From Take­off To Free­dom And Beyond

Super AGI17 May 2023 1:50 UTC
−13 points
11 comments19 min readLW link

[Question] “Frag­ility of Value” vs. LLMs

Not Relevant13 Apr 2022 2:02 UTC
34 points
33 comments1 min readLW link

The In­trin­sic In­ter­play of Hu­man Values and Ar­tifi­cial In­tel­li­gence: Nav­i­gat­ing the Op­ti­miza­tion Challenge

Joe Kwon5 Jun 2023 20:41 UTC
2 points
1 comment18 min readLW link

If I ran the zoo

Optimization Process5 Jan 2024 5:14 UTC
18 points
0 comments2 min readLW link

Aligned Ob­jec­tives Prize Competition

Prometheus15 Jun 2023 12:42 UTC
8 points
0 comments2 min readLW link

Group Pri­ori­tar­i­anism: Why AI Should Not Re­place Hu­man­ity [draft]

fsh15 Jun 2023 17:33 UTC
8 points
0 comments25 min readLW link

Com­plex Be­hav­ior from Sim­ple (Sub)Agents

moridinamael10 May 2019 21:44 UTC
113 points
13 comments9 min readLW link1 review
