RSS

Hu­man Values

TagLast edit: 16 Sep 2021 14:50 UTC by plex

Human Values are the things we care about, and would want an aligned superintelligence to look after and support. It is suspected that true human values are highly complex, and could be extrapolated into a wide variety of forms.

The shard the­ory of hu­man values

4 Sep 2022 4:28 UTC
261 points
67 comments24 min readLW link2 reviews

Multi-agent pre­dic­tive minds and AI alignment

Jan_Kulveit12 Dec 2018 23:48 UTC
63 points
18 comments10 min readLW link

Hu­man val­ues & bi­ases are in­ac­cessible to the genome

TurnTrout7 Jul 2022 17:29 UTC
95 points
54 comments6 min readLW link1 review

3. Uploading

RogerDearnaley23 Nov 2023 7:39 UTC
21 points
5 comments8 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
16 points
15 comments13 min readLW link

Utili­tar­i­anism and the re­place­abil­ity of de­sires and attachments

MichaelStJules27 Jul 2024 1:57 UTC
5 points
2 comments12 min readLW link

6. The Mutable Values Prob­lem in Value Learn­ing and CEV

RogerDearnaley4 Dec 2023 18:31 UTC
12 points
0 comments49 min readLW link

Ends: An Introduction

Rob Bensinger11 Mar 2015 19:00 UTC
19 points
0 comments4 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
41 points
12 comments31 min readLW link

5. Mo­ral Value for Sen­tient An­i­mals? Alas, Not Yet

RogerDearnaley27 Dec 2023 6:42 UTC
33 points
41 comments23 min readLW link

What AI Safety Re­searchers Have Writ­ten About the Na­ture of Hu­man Values

avturchin16 Jan 2019 13:59 UTC
52 points
3 comments15 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC
32 points
23 comments10 min readLW link

[Valence se­ries] 2. Valence & Normativity

Steven Byrnes7 Dec 2023 16:43 UTC
88 points
7 comments28 min readLW link1 review

4. A Mo­ral Case for Evolved-Sapi­ence-Chau­vinism

RogerDearnaley24 Nov 2023 4:56 UTC
10 points
0 comments4 min readLW link

Re­view: For­agers, Farm­ers, and Fos­sil Fuels

L Rudolf L2 Sep 2021 17:59 UTC
28 points
7 comments25 min readLW link
(strataoftheworld.blogspot.com)

How evolu­tion suc­ceeds and fails at value alignment

Ocracoke21 Aug 2022 7:14 UTC
21 points
2 comments4 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John Nay26 Oct 2022 1:24 UTC
1 point
10 comments3 min readLW link

Shard The­ory: An Overview

David Udell11 Aug 2022 5:44 UTC
167 points
34 comments10 min readLW link

Brain-over-body bi­ases, and the em­bod­ied value prob­lem in AI alignment

geoffreymiller24 Sep 2022 22:24 UTC
10 points
6 comments25 min readLW link

AI al­ign­ment with hu­mans… but with which hu­mans?

geoffreymiller9 Sep 2022 18:21 UTC
12 points
33 comments3 min readLW link

On­tolog­i­cal Cri­sis in Humans

Wei Dai18 Dec 2012 17:32 UTC
92 points
69 comments4 min readLW link

We Don’t Know Our Own Values, but Re­ward Bridges The Is-Ought Gap

19 Sep 2024 22:22 UTC
51 points
48 comments5 min readLW link

Four Types of Disagreement

silentbob13 Apr 2025 11:22 UTC
50 points
4 comments5 min readLW link

Notes on Righ­teous­ness and Megalopsychia

David Gross7 Jul 2025 15:18 UTC
12 points
0 comments31 min readLW link

Utilons vs. Hedons

Psychohistorian10 Aug 2009 19:20 UTC
40 points
119 comments6 min readLW link

[Question] Does the ex­is­tence of shared hu­man val­ues im­ply al­ign­ment is “easy”?

Morpheus26 Sep 2022 18:01 UTC
7 points
15 comments1 min readLW link

My Model Of EA Burnout

LoganStrohl25 Jan 2023 17:52 UTC
263 points
50 comments5 min readLW link1 review

De­scrip­tive vs. speci­fi­able values

TsviBT26 Mar 2023 9:10 UTC
17 points
2 comments2 min readLW link

Worse than an un­al­igned AGI

Shmi10 Apr 2022 3:35 UTC
−1 points
11 comments1 min readLW link

What Does It Mean to Align AI With Hu­man Values?

Algon13 Dec 2022 16:56 UTC
8 points
3 comments1 min readLW link
(www.quantamagazine.org)

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

14 Jul 2022 2:31 UTC
213 points
94 comments9 min readLW link1 review

Where does Son­net 4.5′s de­sire to “not get too com­fortable” come from?

Kaj_Sotala4 Oct 2025 10:19 UTC
91 points
16 comments64 min readLW link

Normativity

abramdemski18 Nov 2020 16:52 UTC
47 points
11 comments9 min readLW link

Value No­tion—Ques­tions to Ask

aysajan17 Jan 2022 15:35 UTC
5 points
0 comments4 min readLW link

Why the Prob­lem of the Cri­te­rion Matters

Gordon Seidoh Worley30 Oct 2021 20:44 UTC
24 points
9 comments8 min readLW link

Notes on Judg­ment and Righ­teous Anger

David Gross30 Jan 2021 19:31 UTC
13 points
1 comment7 min readLW link

Silliness

lsusr3 Jun 2022 4:59 UTC
20 points
1 comment1 min readLW link

The Com­pu­ta­tional Anatomy of Hu­man Values

beren6 Apr 2023 10:33 UTC
74 points
30 comments30 min readLW link

Book Re­view: A Pat­tern Lan­guage by Christo­pher Alexander

lincolnquirk15 Oct 2021 1:11 UTC
57 points
8 comments2 min readLW link1 review

Oc­to­ber The First Is Too Late

gwern13 May 2025 21:45 UTC
61 points
10 comments1 min readLW link
(gwern.net)

Pos­i­tive val­ues seem more ro­bust and last­ing than prohibitions

TurnTrout17 Dec 2022 21:43 UTC
52 points
13 comments2 min readLW link

Which val­ues are sta­ble un­der on­tol­ogy shifts?

Richard_Ngo23 Jul 2022 2:40 UTC
75 points
48 comments3 min readLW link
(thinkingcomplete.blogspot.com)

Beyond al­gorith­mic equiv­alence: self-modelling

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
3 comments1 min readLW link

AGI x An­i­mal Welfare: A High-EV Outreach Op­por­tu­nity?

simeon_c28 Jun 2023 20:44 UTC
29 points
0 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: al­gorith­mic noise

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
4 comments2 min readLW link

Un­der­stand­ing and avoid­ing value drift

TurnTrout9 Sep 2022 4:16 UTC
48 points
14 comments6 min readLW link

[Question] What are the best ar­gu­ments for/​against AIs be­ing “slightly ‘nice’”?

Raemon24 Sep 2024 2:00 UTC
102 points
62 comments31 min readLW link

A short di­alogue on com­pa­ra­bil­ity of values

cousin_it20 Dec 2023 14:08 UTC
27 points
7 comments1 min readLW link

[In­ter­view w/​ Quintin Pope] Evolu­tion, val­ues, and AI Safety

fowlertm24 Oct 2023 13:53 UTC
11 points
0 comments1 min readLW link

Book re­view: The Im­por­tance of What We Care About (Harry G. Frank­furt)

David Gross13 Sep 2023 4:17 UTC
7 points
0 comments4 min readLW link

Value sys­tems of the fron­tier AIs, re­duced to slogans

Mitchell_Porter15 Jul 2025 15:10 UTC
4 points
0 comments1 min readLW link

Men­tal sub­agent im­pli­ca­tions for AI Safety

moridinamael3 Jan 2021 18:59 UTC
11 points
0 comments3 min readLW link

Hu­mans can be as­signed any val­ues what­so­ever...

Stuart_Armstrong24 Oct 2017 12:03 UTC
3 points
1 comment4 min readLW link

Data for IRL: What is needed to learn hu­man val­ues?

j_we3 Oct 2022 9:23 UTC
18 points
6 comments12 min readLW link

Notes on Temperance

David Gross9 Nov 2020 2:33 UTC
15 points
2 comments9 min readLW link

The het­ero­gene­ity of hu­man value types: Im­pli­ca­tions for AI alignment

geoffreymiller23 Sep 2022 17:03 UTC
10 points
2 comments10 min readLW link

A broad basin of at­trac­tion around hu­man val­ues?

Wei Dai12 Apr 2022 5:15 UTC
118 points
18 comments2 min readLW link

Trad­ing off Lives

jefftk3 Jan 2024 3:40 UTC
53 points
12 comments2 min readLW link
(www.jefftk.com)

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTrout29 Nov 2022 6:23 UTC
62 points
41 comments15 min readLW link

Shut Up and Divide?

Wei Dai9 Feb 2010 20:09 UTC
123 points
276 comments1 min readLW link

It’s OK to be bi­ased to­wards humans

dr_s11 Nov 2023 11:59 UTC
54 points
69 comments6 min readLW link

[Question] What will hap­pen when an all-reach­ing AGI starts at­tempt­ing to fix hu­man char­ac­ter flaws?

Michael Bright1 Jun 2022 18:45 UTC
1 point
6 comments1 min readLW link

The grass is always greener in the en­vi­ron­ment that shaped your values

Karl Faulks17 Nov 2024 18:00 UTC
8 points
0 comments3 min readLW link

1. Meet the Play­ers: Value Diversity

Allison Duettmann2 Jan 2025 19:00 UTC
32 points
2 comments11 min readLW link

Valuism—an ap­proach to life for you to consider

spencerg19 Jul 2023 15:23 UTC
17 points
2 comments1 min readLW link

[Question] How path-de­pen­dent are hu­man val­ues?

Ege Erdil15 Apr 2022 9:34 UTC
14 points
13 comments2 min readLW link

Up­com­ing sta­bil­ity of values

Stuart_Armstrong15 Mar 2018 11:36 UTC
15 points
15 comments2 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John Nay21 Oct 2022 2:03 UTC
5 points
18 comments54 min readLW link

Every­thing I Know About Elite Amer­ica I Learned From ‘Fresh Prince’ and ‘West Wing’

Wei Dai11 Oct 2020 18:07 UTC
44 points
18 comments1 min readLW link
(www.nytimes.com)

Model­ing hu­mans: what’s the point?

Charlie Steiner10 Nov 2020 1:30 UTC
10 points
1 comment3 min readLW link

Or­di­nary hu­man life

David Hugh-Jones17 Dec 2022 16:46 UTC
24 points
3 comments14 min readLW link
(wyclif.substack.com)

A “Bit­ter Les­son” Ap­proach to Align­ing AGI and ASI

RogerDearnaley6 Jul 2024 1:23 UTC
64 points
41 comments24 min readLW link

Values Are Real Like Harry Potter

9 Oct 2024 23:42 UTC
88 points
21 comments5 min readLW link

Would I think for ten thou­sand years?

Stuart_Armstrong11 Feb 2019 19:37 UTC
28 points
13 comments1 min readLW link

Hu­man Na­ture, ASI al­ign­ment and Extinction

Ismael Tagle Díaz20 Jul 2025 23:36 UTC
1 point
0 comments1 min readLW link

Plea­sure and suffer­ing are not con­cep­tual opposites

MichaelStJules11 Aug 2024 18:32 UTC
7 points
0 comments1 min readLW link

[Thought Ex­per­i­ment] To­mor­row’s Echo—The fu­ture of syn­thetic com­pan­ion­ship.

Vimal Naran26 Oct 2023 17:54 UTC
−7 points
2 comments2 min readLW link

How to re­spond to the re­cent con­dem­na­tions of the ra­tio­nal­ist community

Christopher King4 Apr 2023 1:42 UTC
−2 points
7 comments4 min readLW link

Where Utopias Go Wrong, or: The Four Lit­tle Planets

ExCeph27 May 2022 1:24 UTC
15 points
0 comments11 min readLW link
(ginnungagapfoundation.wordpress.com)

Neu­roAI for AI safety: A Differ­en­tial Path

16 Dec 2024 13:17 UTC
22 points
0 comments7 min readLW link
(arxiv.org)

Un­cov­er­ing La­tent Hu­man Wel­lbe­ing in LLM Embeddings

14 Sep 2023 1:40 UTC
32 points
7 comments8 min readLW link
(far.ai)

Not for the Sake of Selfish­ness Alone

lukeprog2 Jul 2011 17:37 UTC
34 points
20 comments8 min readLW link

Build­ing AI safety bench­mark en­vi­ron­ments on themes of uni­ver­sal hu­man values

Roland Pihlakas3 Jan 2025 4:24 UTC
18 points
3 comments8 min readLW link
(docs.google.com)

P(doom|su­per­in­tel­li­gence) or coin tosses and dice throws of hu­man val­ues (and other re­lated Ps).

Muyyd22 Apr 2023 10:06 UTC
−7 points
0 comments4 min readLW link

Demo­cratic Fine-Tuning

Joe Edelman29 Aug 2023 18:13 UTC
22 points
2 comments1 min readLW link
(open.substack.com)

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James Blaha20 Feb 2023 0:55 UTC
10 points
2 comments18 min readLW link

Pre­serv­ing our her­i­tage: Build­ing a move­ment and a knowl­edge ark for cur­rent and fu­ture generations

rnk829 Nov 2023 19:20 UTC
0 points
5 comments12 min readLW link

The Gift We Give To Tomorrow

Eliezer Yudkowsky17 Jul 2008 6:07 UTC
163 points
101 comments8 min readLW link

In Praise of Max­i­miz­ing – With Some Caveats

David Althaus15 Mar 2015 19:40 UTC
32 points
19 comments10 min readLW link

In­visi­ble Frameworks

Eliezer Yudkowsky22 Aug 2008 3:36 UTC
27 points
47 comments6 min readLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Introduction

21 Nov 2022 20:34 UTC
34 points
3 comments4 min readLW link
(www.snellessen.com)

Safety First: safety be­fore full al­ign­ment. The de­on­tic suffi­ciency hy­poth­e­sis.

Chris Lakin3 Jan 2024 17:55 UTC
48 points
3 comments3 min readLW link

Con­tent gen­er­a­tion. Where do we draw the line?

Q Home9 Aug 2022 10:51 UTC
6 points
7 comments2 min readLW link

How to co­or­di­nate de­spite our bi­ases? - tldr

Ryo 18 Apr 2024 15:03 UTC
3 points
2 comments3 min readLW link
(medium.com)

[Question] Ex­plor­ing Values in the Fu­ture of AI and Hu­man­ity: A Path Forward

Lucian&Sage19 Oct 2024 23:37 UTC
1 point
0 comments5 min readLW link

A (para­con­sis­tent) logic to deal with in­con­sis­tent preferences

B Jacobs14 Jul 2024 11:17 UTC
6 points
2 comments4 min readLW link
(bobjacobs.substack.com)

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

Rohin Shah19 Oct 2019 0:30 UTC
60 points
12 comments15 min readLW link
(mailchi.mp)

[Heb­bian Nat­u­ral Ab­strac­tions] Math­e­mat­i­cal Foundations

25 Dec 2022 20:58 UTC
15 points
2 comments6 min readLW link
(www.snellessen.com)

Con­tra Steiner on Too Many Nat­u­ral Abstractions

DragonGod24 Dec 2022 17:42 UTC
10 points
6 comments1 min readLW link

Should AI learn hu­man val­ues, hu­man norms or some­thing else?

Q Home17 Sep 2022 6:19 UTC
5 points
1 comment4 min readLW link

Just How Hard a Prob­lem is Align­ment?

Roger Dearnaley25 Feb 2023 9:00 UTC
3 points
1 comment21 min readLW link

Not Just For Ther­apy Chat­bots: The Case For Com­pas­sion In AI Mo­ral Align­ment Research

kenneth_diao30 Sep 2024 18:37 UTC
2 points
0 comments12 min readLW link

Mus­ings of a Lay­man: Tech­nol­ogy, AI, and the Hu­man Condition

Crimson Liquidity15 Jul 2024 18:40 UTC
−2 points
0 comments8 min readLW link

[FICTION] ECHOES OF ELYSIUM: An Ai’s Jour­ney From Take­off To Free­dom And Beyond

Super AGI17 May 2023 1:50 UTC
−13 points
11 comments19 min readLW link

[Linkpost] Con­cept Align­ment as a Pr­ereq­ui­site for Value Alignment

Bogdan Ionut Cirstea4 Nov 2023 17:34 UTC
27 points
0 comments1 min readLW link
(arxiv.org)

Agent mem­branes/​bound­aries and for­mal­iz­ing “safety”

Chris Lakin3 Jan 2024 17:55 UTC
26 points
46 comments3 min readLW link

Please Understand

samhealy1 Apr 2024 12:33 UTC
28 points
11 comments6 min readLW link

Thought ex­per­i­ment: coarse-grained VR utopia

cousin_it14 Jun 2017 8:03 UTC
27 points
48 comments1 min readLW link

Selfish­ness, prefer­ence falsifi­ca­tion, and AI alignment

jessicata28 Oct 2021 0:16 UTC
52 points
28 comments13 min readLW link
(unstableontology.com)

The case against “The case against AI al­ign­ment”

KvmanThinking19 Mar 2025 22:40 UTC
1 point
0 comments1 min readLW link

The In­trin­sic In­ter­play of Hu­man Values and Ar­tifi­cial In­tel­li­gence: Nav­i­gat­ing the Op­ti­miza­tion Challenge

Joe Kwon5 Jun 2023 20:41 UTC
2 points
1 comment18 min readLW link

Aligned Ob­jec­tives Prize Competition

Prometheus15 Jun 2023 12:42 UTC
8 points
0 comments2 min readLW link
(app.impactmarkets.io)

Hu­man val­ues differ as much as val­ues can differ

PhilGoetz3 May 2010 19:35 UTC
27 points
220 comments7 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

22 Jun 2022 15:05 UTC
32 points
1 comment14 min readLW link

Tether­ware #1: The case for hu­man­like AI with free will

Jáchym Fibír30 Jan 2025 10:58 UTC
5 points
14 comments10 min readLW link
(tetherware.substack.com)

The Unified The­ory of Nor­ma­tive Ethics

Thane Ruthenis17 Jun 2022 19:55 UTC
8 points
0 comments6 min readLW link

1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC
17 points
8 comments15 min readLW link

Don’t want Good­hart? — Spec­ify the damn variables

Yan Lyutnev21 Nov 2024 22:45 UTC
−3 points
2 comments5 min readLW link

Ques­tions about Value Lock-in, Pa­ter­nal­ism, and Empowerment

Sam F. Brown16 Nov 2022 15:33 UTC
13 points
2 comments12 min readLW link
(sambrown.eu)

[Question] “Frag­ility of Value” vs. LLMs

Not Relevant13 Apr 2022 2:02 UTC
34 points
33 comments1 min readLW link

An­tag­o­nis­tic AI

Xybermancer1 Mar 2024 18:50 UTC
−8 points
1 comment1 min readLW link

A foun­da­tion model ap­proach to value inference

sen21 Feb 2023 5:09 UTC
6 points
0 comments3 min readLW link

Value is Fragile

Eliezer Yudkowsky29 Jan 2009 8:46 UTC
175 points
109 comments6 min readLW link

Values Form a Shift­ing Land­scape (and why you might care)

VojtaKovarik5 Dec 2020 23:56 UTC
29 points
6 comments4 min readLW link

In­tel­li­gence–Agency Equiv­alence ≈ Mass–En­ergy Equiv­alence: On Static Na­ture of In­tel­li­gence & Phys­i­cal­iza­tion of Ethics

ank22 Feb 2025 0:12 UTC
1 point
0 comments6 min readLW link

Par­tial Iden­ti­fi­a­bil­ity in Re­ward Learning

Joar Skalse28 Feb 2025 19:23 UTC
16 points
0 comments12 min readLW link

The Para­dox of Low Fertility

Zero Contradictions24 May 2025 0:59 UTC
−9 points
6 comments1 min readLW link
(expandingrationality.substack.com)

Group Pri­ori­tar­i­anism: Why AI Should Not Re­place Hu­man­ity [draft]

fsh15 Jun 2023 17:33 UTC
8 points
0 comments25 min readLW link

What’s wrong with sim­plic­ity of value?

Wei Dai27 Jul 2011 3:09 UTC
29 points
40 comments1 min readLW link

Wager­ing on Will And Worth (Pas­cals Wager for Free Will and Value)

Robert Cousineau27 Nov 2024 0:43 UTC
−1 points
2 comments3 min readLW link

If we can ed­u­cate AIs, why not ap­ply that ed­u­ca­tion to peo­ple?

P. João22 Aug 2025 14:04 UTC
5 points
0 comments2 min readLW link

[Question] Is there any se­ri­ous at­tempt to cre­ate a sys­tem to figure out the CEV of hu­man­ity and if not, why haven’t we started yet?

Jonas Hallgren25 Feb 2021 22:06 UTC
5 points
2 comments1 min readLW link

AGI will know: Hu­mans are not Rational

HumaneAutomation20 Mar 2023 18:46 UTC
0 points
10 comments2 min readLW link

[Question] [DISC] Are Values Ro­bust?

DragonGod21 Dec 2022 1:00 UTC
12 points
9 comments2 min readLW link

What does davi­dad want from «bound­aries»?

6 Feb 2024 17:45 UTC
47 points
1 comment5 min readLW link

“Want­ing” and “lik­ing”

Mateusz Bagiński30 Aug 2023 14:52 UTC
23 points
3 comments29 min readLW link

Align­ment via proso­cial brain algorithms

Cameron Berg12 Sep 2022 13:48 UTC
45 points
30 comments6 min readLW link

Should Art Carry the Weight of Shap­ing our Values?

Krishna Maneesha Dendukuri28 Jan 2025 18:43 UTC
2 points
0 comments3 min readLW link

Is the En­dow­ment Effect Due to In­com­pa­ra­bil­ity?

Kevin Dorst10 Jul 2023 16:26 UTC
21 points
10 comments7 min readLW link
(kevindorst.substack.com)

Don’t want Good­hart? — Spec­ify the vari­ables more

YanLyutnev21 Nov 2024 22:43 UTC
2 points
2 comments5 min readLW link

Con­verg­ing to­ward a Million Worlds

Joe Kwon24 Dec 2021 21:33 UTC
11 points
1 comment3 min readLW link

This Is Not Life

samhealy28 Jul 2025 8:43 UTC
55 points
2 comments23 min readLW link

Re­la­tional De­sign Can’t Be Left to Chance

Priyanka Bharadwaj22 Jun 2025 15:32 UTC
5 points
0 comments3 min readLW link

The Digi­tal Asym­me­try: A Call for Me­mory-Less AI and Hu­man Cog­ni­tive Evolution

Full-Embodied Chaos30 Apr 2025 18:19 UTC
1 point
0 comments4 min readLW link

Ex­pla­na­tions as Build­ing Blocks of Hu­man Mind

pavi18 Oct 2024 21:38 UTC
1 point
0 comments1 min readLW link

My cri­tique of Eliezer’s deeply ir­ra­tional beliefs

Jorterder16 Nov 2023 0:34 UTC
−35 points
1 comment9 min readLW link
(docs.google.com)

Inescapably Value-Laden Ex­pe­rience—a Catchy Term I Made Up to Make Mo­ral­ity Rationalisable

James Stephen Brown19 Dec 2024 4:45 UTC
5 points
0 comments2 min readLW link
(nonzerosum.games)

Post AGI effect prediction

Juliezhanggg1 Feb 2025 21:16 UTC
1 point
0 comments7 min readLW link

Build­ing as gardening

Itay Dreyfus5 Jun 2025 6:41 UTC
3 points
1 comment4 min readLW link
(productidentity.co)

Other Papers About the The­ory of Re­ward Learning

Joar Skalse28 Feb 2025 19:26 UTC
16 points
0 comments5 min readLW link

Public Opinion on AI Safety: AIMS 2023 and 2021 Summary

25 Sep 2023 18:55 UTC
3 points
2 comments3 min readLW link
(www.sentienceinstitute.org)

Alien Axiology

snerx20 Apr 2023 0:27 UTC
3 points
2 comments5 min readLW link

Why mod­el­ling multi-ob­jec­tive home­osta­sis is es­sen­tial for AI al­ign­ment (and how it helps with AI safety as well). Subtleties and Open Challenges.

Roland Pihlakas12 Jan 2025 3:37 UTC
47 points
7 comments12 min readLW link

The Perfec­tion Trap: How For­mally Aligned AI Sys­tems May Create Inescapable Eth­i­cal Dystopias

Chris O'Quinn1 Jun 2025 23:12 UTC
1 point
0 comments43 min readLW link

Se­quence overview: Welfare and moral weights

MichaelStJules15 Aug 2024 4:22 UTC
7 points
0 comments1 min readLW link

Tak­ing non­log­i­cal con­cepts seriously

Kris Brown15 Oct 2024 18:16 UTC
7 points
5 comments18 min readLW link
(topos.site)

Ques­tion 2: Pre­dicted bad out­comes of AGI learn­ing architecture

Cameron Berg11 Feb 2022 22:23 UTC
5 points
1 comment10 min readLW link

Should Effec­tive Altru­ists be Valuists in­stead of util­i­tar­i­ans?

25 Sep 2023 14:03 UTC
1 point
3 comments6 min readLW link

Prefer­ence syn­the­sis illus­trated: Star Wars

Stuart_Armstrong9 Jan 2020 16:47 UTC
20 points
8 comments3 min readLW link

Ter­mi­nal Bias

[deleted]30 Jan 2012 21:03 UTC
24 points
125 comments6 min readLW link

First Cer­tified Public Solve of Ob­server’s False Path In­sta­bil­ity — Level 4 (Ad­vanced Var­i­ant) — Walter Taran­telli — 2025-05-30 UTC

Walter Tarantelli31 May 2025 1:41 UTC
1 point
0 comments2 min readLW link

Re­search Notes: What are we al­ign­ing for?

Shoshannah Tekofsky8 Jul 2022 22:13 UTC
19 points
8 comments2 min readLW link

If we can ed­u­cate AIs, why not ap­ply that ed­u­ca­tion to peo­ple? - A Si­mu­la­tion with Claude

P. João28 Aug 2025 16:37 UTC
3 points
0 comments7 min readLW link

Why No *In­ter­est­ing* Unal­igned Sin­gu­lar­ity?

David Udell20 Apr 2022 0:34 UTC
12 points
12 comments1 min readLW link

Hu­man wanting

TsviBT24 Oct 2023 1:05 UTC
53 points
1 comment10 min readLW link

Defin­ing and Char­ac­ter­is­ing Re­ward Hacking

Joar Skalse28 Feb 2025 19:25 UTC
15 points
0 comments4 min readLW link

No­body Asks the Mon­key: Why Hu­man Agency Mat­ters in the AI Age

Miloš Borenović3 Dec 2024 14:16 UTC
1 point
0 comments2 min readLW link
(open.substack.com)

Look­ing for hu­man­ness in the world wide social

Itay Dreyfus15 Jan 2025 14:50 UTC
11 points
0 comments6 min readLW link
(productidentity.co)

Im­pos­si­bil­ity of An­thro­pocen­tric-Alignment

False Name24 Feb 2024 18:31 UTC
−8 points
2 comments39 min readLW link

2. AIs as Eco­nomic Agents

RogerDearnaley23 Nov 2023 7:07 UTC
9 points
2 comments6 min readLW link

To Rae­mon: bet in My (per­sonal) Goals

P. João31 Aug 2025 15:48 UTC
3 points
0 comments3 min readLW link

The Align­ment Prob­lem No One Is Talk­ing About

James Stephen Brown10 May 2024 18:34 UTC
10 points
10 comments2 min readLW link
(nonzerosum.games)

Lan­guage and My Frus­tra­tion Con­tinue in Our RSI

TristanTrim26 Mar 2025 14:13 UTC
2 points
1 comment7 min readLW link

The Dual-Path Frame­work: A Non-Pa­ter­nal­is­tic Ap­proach to AGI Align­ment That Re­spects Hu­man Choice

JoeTruax2 Oct 2025 15:57 UTC
1 point
0 comments3 min readLW link

In­for­ma­tion and Eth­i­cal Value: A Frame­work for In­for­ma­tion Loss and Survival

yun dong12 Sep 2025 3:26 UTC
−1 points
0 comments6 min readLW link

Ques­tion 4: Im­ple­ment­ing the con­trol proposals

Cameron Berg13 Feb 2022 17:12 UTC
6 points
2 comments5 min readLW link

Are we the Wolves now? Hu­man Eu­gen­ics un­der AI Control

Brit30 Jan 2025 8:31 UTC
−1 points
2 comments2 min readLW link

Broad Pic­ture of Hu­man Values

Thane Ruthenis20 Aug 2022 19:42 UTC
42 points
6 comments10 min readLW link

A Lived Align­ment Loop: Sym­bolic Emer­gence and Emo­tional Co­her­ence from Un­struc­tured ChatGPT Reflection

BradCL17 Jun 2025 0:11 UTC
1 point
0 comments2 min readLW link

In­ner Goodness

Eliezer Yudkowsky23 Oct 2008 22:19 UTC
27 points
31 comments7 min readLW link

Black-box in­ter­pretabil­ity method­ol­ogy blueprint: Prob­ing run­away op­ti­mi­sa­tion in LLMs

Roland Pihlakas22 Jun 2025 18:16 UTC
17 points
0 comments7 min readLW link

A Cri­tique of “Utility”

Zero Contradictions20 Mar 2025 23:21 UTC
−2 points
10 comments2 min readLW link
(thewaywardaxolotl.blogspot.com)

Sam Har­ris’s Ar­gu­ment For Ob­jec­tive Morality

Zero Contradictions5 Dec 2024 10:19 UTC
7 points
5 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

What are Hu­mans, God or Devil ?

Learning Elder27 Apr 2025 12:58 UTC
0 points
0 comments1 min readLW link

Shard The­ory—is it true for hu­mans?

Rishika14 Jun 2024 19:21 UTC
71 points
7 comments15 min readLW link

‘The­o­ries of Values’ and ‘The­o­ries of Agents’: con­fu­sions, mus­ings and desiderata

15 Nov 2023 16:00 UTC
35 points
8 comments24 min readLW link

If I ran the zoo

Optimization Process5 Jan 2024 5:14 UTC
18 points
1 comment2 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron Berg3 Oct 2021 20:10 UTC
19 points
1 comment16 min readLW link

Quick thoughts on em­pathic metaethics

lukeprog12 Dec 2017 21:46 UTC
29 points
0 comments9 min readLW link

Every­thing you care about is in the map

Tahp17 Dec 2024 14:05 UTC
17 points
27 comments3 min readLW link

Value learn­ing in the ab­sence of ground truth

Joel_Saarinen5 Feb 2024 18:56 UTC
47 points
8 comments45 min readLW link

Why Death Makes Us Human

Yasha Sheynin26 Aug 2025 14:17 UTC
1 point
0 comments9 min readLW link

Com­plex Be­hav­ior from Sim­ple (Sub)Agents

moridinamael10 May 2019 21:44 UTC
113 points
14 comments9 min readLW link1 review

Prob­lems with Robin Han­son’s Quillette Ar­ti­cle On AI

DaemonicSigil6 Aug 2023 22:13 UTC
89 points
33 comments8 min readLW link

Sys­tem­atic run­away-op­ti­miser-like LLM failure modes on Biolog­i­cally and Eco­nom­i­cally al­igned AI safety bench­marks for LLMs with sim­plified ob­ser­va­tion for­mat (BioBlue)

16 Mar 2025 23:23 UTC
45 points
8 comments11 min readLW link

Every­thing You Want Is Learned (And That Changes Every­thing)

gchu18 Jun 2025 20:13 UTC
1 point
0 comments7 min readLW link
No comments.