RSS

Agent Foundations

Tag

Why Agent Foun­da­tions? An Overly Ab­stract Explanation

johnswentworth25 Mar 2022 23:17 UTC
310 points
58 comments8 min readLW link1 review

Embed­ded Agency (full-text ver­sion)

15 Nov 2018 19:49 UTC
210 points
17 comments54 min readLW link

The Rocket Align­ment Problem

Eliezer Yudkowsky4 Oct 2018 0:38 UTC
230 points
44 comments15 min readLW link2 reviews

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermott15 May 2023 16:09 UTC
62 points
1 comment13 min readLW link

Un­der­stand­ing In­fra-Bayesi­anism: A Begin­ner-Friendly Video Series

22 Sep 2022 13:25 UTC
140 points
6 comments2 min readLW link

Orthog­o­nal: A new agent foun­da­tions al­ign­ment organization

Tamsin Leake19 Apr 2023 20:17 UTC
217 points
4 comments1 min readLW link
(orxl.org)

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
37 points
4 comments2 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
16 points
15 comments13 min readLW link

Work­ing through a small tiling result

James Payor13 May 2025 20:28 UTC
66 points
9 comments5 min readLW link

You won’t solve al­ign­ment with­out agent foundations

Mikhail Samin6 Nov 2022 8:07 UTC
29 points
3 comments8 min readLW link

Why Si­mu­la­tor AIs want to be Ac­tive In­fer­ence AIs

10 Apr 2023 18:23 UTC
96 points
9 comments8 min readLW link1 review

Clar­ify­ing the Agent-Like Struc­ture Problem

johnswentworth29 Sep 2022 21:28 UTC
63 points
19 comments6 min readLW link

0th Per­son and 1st Per­son Logic

Adele Lopez10 Mar 2024 0:56 UTC
60 points
28 comments6 min readLW link

My take on agent foun­da­tions: for­mal­iz­ing metaphilo­soph­i­cal competence

zhukeepa1 Apr 2018 6:33 UTC
21 points
6 comments1 min readLW link

Short Timelines Don’t De­value Long Hori­zon Research

Vladimir_Nesov9 Apr 2025 0:42 UTC
169 points
24 comments1 min readLW link

for­mal­iz­ing the QACI al­ign­ment for­mal-goal

10 Jun 2023 3:28 UTC
54 points
6 comments13 min readLW link
(carado.moe)

The Learn­ing-The­o­retic Agenda: Sta­tus 2023

Vanessa Kosoy19 Apr 2023 5:21 UTC
144 points
22 comments56 min readLW link3 reviews

Non-Mono­tonic In­fra-Bayesian Physicalism

Marcus Ogren2 Apr 2025 12:14 UTC
34 points
0 comments18 min readLW link

Time com­plex­ity for de­ter­minis­tic string machines

alcatal21 Apr 2024 22:35 UTC
21 points
2 comments21 min readLW link

[Question] Cri­tiques of the Agent Foun­da­tions agenda?

Jsevillamol24 Nov 2020 16:11 UTC
16 points
3 comments1 min readLW link

Fixed points in mor­tal pop­u­la­tion games

ViktoriaMalyasova14 Mar 2023 7:10 UTC
31 points
0 comments12 min readLW link
(www.lesswrong.com)

Em­piri­cal vs. Math­e­mat­i­cal Joints of Nature

26 Jun 2024 1:55 UTC
35 points
1 comment5 min readLW link

Wild­fire of strategicness

TsviBT5 Jun 2023 13:59 UTC
38 points
19 comments1 min readLW link

An­nounce­ment: Learn­ing The­ory On­line Course

20 Jan 2025 19:55 UTC
63 points
33 comments4 min readLW link

Live The­ory Part 0: Tak­ing In­tel­li­gence Seriously

Sahil26 Jun 2024 21:37 UTC
101 points
3 comments8 min readLW link

Towards a for­mal­iza­tion of the agent struc­ture problem

Alex_Altair29 Apr 2024 20:28 UTC
55 points
6 comments14 min readLW link

Pro­ceed­ings of ILIAD: Les­sons and Progress

28 Apr 2025 19:04 UTC
77 points
5 comments8 min readLW link

Come join Dove­tail’s agent foun­da­tions fel­low­ship talks & discussion

Alex_Altair15 Feb 2025 22:10 UTC
24 points
0 comments1 min readLW link

A very non-tech­ni­cal ex­pla­na­tion of the ba­sics of in­fra-Bayesianism

David Matolcsi26 Apr 2023 22:57 UTC
62 points
9 comments9 min readLW link

[Question] Does agent foun­da­tions cover all fu­ture ML sys­tems?

Jonas Hallgren25 Jul 2022 1:17 UTC
4 points
0 comments1 min readLW link

Uncer­tainty in all its flavours

Cleo Nardo9 Jan 2024 16:21 UTC
34 points
6 comments35 min readLW link

Is al­ign­ment re­ducible to be­com­ing more co­her­ent?

Cole Wyeth22 Apr 2025 23:47 UTC
19 points
0 comments3 min readLW link

Mean­ing & Agency

abramdemski19 Dec 2023 22:27 UTC
93 points
17 comments14 min readLW link

[Question] Take over my pro­ject: do com­putable agents plan against the uni­ver­sal dis­tri­bu­tion pes­simisti­cally?

Cole Wyeth19 Feb 2025 20:17 UTC
25 points
3 comments3 min readLW link

Video lec­tures on the learn­ing-the­o­retic agenda

Vanessa Kosoy27 Oct 2024 12:01 UTC
75 points
0 comments1 min readLW link
(www.youtube.com)

Ab­stract Math­e­mat­i­cal Con­cepts vs. Ab­strac­tions Over Real-World Systems

Thane Ruthenis18 Feb 2025 18:04 UTC
32 points
10 comments4 min readLW link

Game The­ory with­out Argmax [Part 2]

Cleo Nardo11 Nov 2023 16:02 UTC
31 points
14 comments13 min readLW link

Ideal­ized Agents Are Ap­prox­i­mate Causal Mir­rors (+ Rad­i­cal Op­ti­mism on Agent Foun­da­tions)

Thane Ruthenis22 Dec 2023 20:19 UTC
74 points
14 comments6 min readLW link

New Paper: In­fra-Bayesian De­ci­sion-Es­ti­ma­tion Theory

10 Apr 2025 9:17 UTC
77 points
4 comments1 min readLW link
(arxiv.org)

In­fra-Bayesian phys­i­cal­ism: a for­mal the­ory of nat­u­ral­ized induction

Vanessa Kosoy30 Nov 2021 22:25 UTC
114 points
23 comments42 min readLW link1 review

Con­se­quen­tial­ism is in the Stars not Ourselves

DragonGod24 Apr 2023 0:02 UTC
7 points
19 comments5 min readLW link

Re­port & ret­ro­spec­tive on the Dove­tail fellowship

Alex_Altair14 Mar 2025 23:20 UTC
26 points
3 comments9 min readLW link

Lin­ear in­fra-Bayesian Bandits

Vanessa Kosoy10 May 2024 6:41 UTC
40 points
5 comments1 min readLW link
(arxiv.org)

In­ter­pret­ing Quan­tum Me­chan­ics in In­fra-Bayesian Physicalism

Yegreg12 Feb 2024 18:56 UTC
30 points
6 comments43 min readLW link

[Closed] Gaug­ing In­ter­est for a Learn­ing-The­o­retic Agenda Men­tor­ship Programme

Vanessa Kosoy16 Feb 2025 16:24 UTC
54 points
5 comments2 min readLW link

For­mal­iz­ing the In­for­mal (event in­vite)

abramdemski10 Sep 2024 19:22 UTC
42 points
0 comments1 min readLW link

Talk: “AI Would Be A Lot Less Alarm­ing If We Un­der­stood Agents”

johnswentworth17 Dec 2023 23:46 UTC
58 points
3 comments1 min readLW link
(www.youtube.com)

[Closed] Prize and fast track to al­ign­ment re­search at ALTER

Vanessa Kosoy17 Sep 2022 16:58 UTC
63 points
8 comments3 min readLW link

New Paper: Am­bigu­ous On­line Learning

Vanessa Kosoy25 Jun 2025 9:14 UTC
30 points
2 comments1 min readLW link
(arxiv.org)

Glass box learn­ers want to be black box

Cole Wyeth10 May 2025 11:05 UTC
47 points
10 comments4 min readLW link

Challenges with Break­ing into MIRI-Style Research

Chris_Leong17 Jan 2022 9:23 UTC
75 points
16 comments2 min readLW link

Co­her­ence of Caches and Agents

johnswentworth1 Apr 2024 23:04 UTC
78 points
9 comments11 min readLW link

Game The­ory with­out Argmax [Part 1]

Cleo Nardo11 Nov 2023 15:59 UTC
70 points
18 comments19 min readLW link

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew_Critch19 Nov 2020 3:18 UTC
206 points
37 comments50 min readLW link2 reviews

Learn­ing-the­o­retic agenda read­ing list

Vanessa Kosoy9 Nov 2023 17:25 UTC
103 points
1 comment2 min readLW link1 review

The Plan − 2023 Version

johnswentworth29 Dec 2023 23:34 UTC
152 points
40 comments31 min readLW link1 review

(A → B) → A

Scott Garrabrant11 Sep 2018 22:38 UTC
80 points
11 comments2 min readLW link

Hier­ar­chi­cal Agency: A Miss­ing Piece in AI Alignment

Jan_Kulveit27 Nov 2024 5:49 UTC
114 points
22 comments11 min readLW link

Leav­ing MIRI, Seek­ing Funding

abramdemski8 Aug 2024 18:32 UTC
264 points
19 comments2 min readLW link

Work with me on agent foun­da­tions: in­de­pen­dent fellowship

Alex_Altair21 Sep 2024 13:59 UTC
59 points
5 comments4 min readLW link

What’s next for the field of Agent Foun­da­tions?

30 Nov 2023 17:55 UTC
59 points
23 comments10 min readLW link

Public Call for In­ter­est in Math­e­mat­i­cal Alignment

Davidmanheim22 Nov 2023 13:22 UTC
90 points
9 comments1 min readLW link

Towards the Oper­a­tional­iza­tion of Philos­o­phy & Wisdom

Thane Ruthenis28 Oct 2024 19:45 UTC
20 points
2 comments33 min readLW link
(aiimpacts.org)

Con­tra “Strong Co­her­ence”

DragonGod4 Mar 2023 20:05 UTC
39 points
24 comments1 min readLW link

Refine­ment of Ac­tive In­fer­ence agency ontology

Roman Leventov15 Dec 2023 9:31 UTC
16 points
0 comments5 min readLW link
(arxiv.org)

AXRP Epi­sode 15 - Nat­u­ral Ab­strac­tions with John Wentworth

DanielFilan23 May 2022 5:40 UTC
34 points
1 comment58 min readLW link

Box in­ver­sion revisited

Jan_Kulveit7 Nov 2023 11:09 UTC
40 points
3 comments8 min readLW link

No, Futarchy Doesn’t Have an EDT Flaw

Mikhail Samin27 Jun 2025 9:27 UTC
24 points
25 comments2 min readLW link

Agent Foun­da­tions 2025 at CMU

19 Jan 2025 23:48 UTC
90 points
10 comments1 min readLW link

My re­search agenda in agent foundations

Alex_Altair28 Jun 2023 18:00 UTC
75 points
9 comments11 min readLW link

Com­po­si­tional lan­guage for hy­pothe­ses about computations

Vanessa Kosoy11 Mar 2023 19:43 UTC
38 points
6 comments12 min readLW link

AXRP Epi­sode 25 - Co­op­er­a­tive AI with Cas­par Oesterheld

DanielFilan3 Oct 2023 21:50 UTC
43 points
0 comments92 min readLW link

[Closed] Agent Foun­da­tions track in MATS

Vanessa Kosoy31 Oct 2023 8:12 UTC
54 points
1 comment1 min readLW link
(www.matsprogram.org)

Most Minds are Irrational

Davidmanheim10 Dec 2024 9:36 UTC
17 points
4 comments10 min readLW link

Deep Learn­ing is cheap Solomonoff in­duc­tion?

7 Dec 2024 11:00 UTC
45 points
1 comment17 min readLW link

UDT1.01: Log­i­cal In­duc­tors and Im­plicit Beliefs (5/​10)

Diffractor18 Apr 2024 8:39 UTC
34 points
2 comments19 min readLW link

What is Inad­e­quate about Bayesi­anism for AI Align­ment: Mo­ti­vat­ing In­fra-Bayesianism

Brittany Gelb1 May 2025 19:06 UTC
19 points
0 comments7 min readLW link

S-Ex­pres­sions as a De­sign Lan­guage: A Tool for De­con­fu­sion in Align­ment

Johannes C. Mayer19 Jun 2025 19:03 UTC
5 points
0 comments6 min readLW link

Rul­ing Out Lookup Tables

Alfred Harwood4 Feb 2025 10:39 UTC
22 points
11 comments7 min readLW link

Ar­gu­ments about Highly Reli­able Agent De­signs as a Use­ful Path to Ar­tifi­cial In­tel­li­gence Safety

27 Jan 2022 13:13 UTC
27 points
0 comments1 min readLW link
(arxiv.org)

Proof Sec­tion to an In­tro­duc­tion to Re­in­force­ment Learn­ing for Un­der­stand­ing In­fra-Bayesianism

Brittany Gelb17 May 2025 2:36 UTC
3 points
0 comments9 min readLW link

Distill­ing the In­ter­nal Model Prin­ci­ple part II

JoseFaustino30 Apr 2025 17:56 UTC
15 points
0 comments19 min readLW link

[Question] Pop­u­lar ma­te­ri­als about en­vi­ron­men­tal goals/​agent foun­da­tions? Peo­ple want­ing to dis­cuss such top­ics?

Q Home22 Jan 2025 3:30 UTC
5 points
0 comments1 min readLW link

Op­ti­mi­sa­tion Mea­sures: Desider­ata, Im­pos­si­bil­ity, Proposals

7 Aug 2023 15:52 UTC
36 points
9 comments1 min readLW link

A mostly crit­i­cal re­view of in­fra-Bayesianism

David Matolcsi28 Feb 2023 18:37 UTC
108 points
9 comments29 min readLW link

Towards Mea­sures of Optimisation

12 May 2023 15:29 UTC
53 points
37 comments4 min readLW link

De­tect Good­hart and shut down

Jeremy Gillen22 Jan 2025 18:45 UTC
70 points
21 comments7 min readLW link

An In­tro­duc­tion to Ev­i­den­tial De­ci­sion Theory

Babić2 Feb 2025 21:27 UTC
5 points
2 comments10 min readLW link

Towards build­ing blocks of ontologies

8 Feb 2025 16:03 UTC
29 points
0 comments26 min readLW link

Re­ward is not Ne­c­es­sary: How to Create a Com­po­si­tional Self-Pre­serv­ing Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC
17 points
2 comments2 min readLW link
(arxiv.org)

A New Frame­work for AI Align­ment: A Philo­soph­i­cal Approach

niscalajyoti25 Jun 2025 2:41 UTC
1 point
0 comments1 min readLW link
(archive.org)

Three Types of Con­straints in the Space of Agents

15 Jan 2024 17:27 UTC
26 points
3 comments17 min readLW link

Ra­tional Effec­tive Utopia & Nar­row Way There: Mul­tiver­sal AI Align­ment, Place AI, New Ethico­physics… (Up­dated)

ank11 Feb 2025 3:21 UTC
13 points
8 comments35 min readLW link

Gear­ing Up for Long Timelines in a Hard World

Dalcy14 Jul 2023 6:11 UTC
18 points
0 comments4 min readLW link

Ab­strac­tions are not Natural

Alfred Harwood4 Nov 2024 11:10 UTC
25 points
21 comments11 min readLW link

In­tel­li­gence–Agency Equiv­alence ≈ Mass–En­ergy Equiv­alence: On Static Na­ture of In­tel­li­gence & Phys­i­cal­iza­tion of Ethics

ank22 Feb 2025 0:12 UTC
1 point
0 comments6 min readLW link

Re­but­tals for ~all crit­i­cisms of AIXI

Cole Wyeth7 Jan 2025 17:41 UTC
25 points
17 comments14 min readLW link

Un­der­stand­ing Selec­tion Theorems

adamk28 May 2022 1:49 UTC
41 points
3 comments7 min readLW link

Perfor­mance guaran­tees in clas­si­cal learn­ing the­ory and in­fra-Bayesianism

David Matolcsi28 Feb 2023 18:37 UTC
9 points
4 comments31 min readLW link

Distill­ing the In­ter­nal Model Principle

JoseFaustino8 Feb 2025 14:59 UTC
21 points
0 comments16 min readLW link

Can AI agents learn to be good?

Ram Rachum29 Aug 2024 14:20 UTC
8 points
0 comments1 min readLW link
(futureoflife.org)

In­fra-Bayesi­anism nat­u­rally leads to the mono­ton­ic­ity prin­ci­ple, and I think this is a problem

David Matolcsi26 Apr 2023 21:39 UTC
22 points
6 comments4 min readLW link

Goal al­ign­ment with­out al­ign­ment on episte­mol­ogy, ethics, and sci­ence is futile

Roman Leventov7 Apr 2023 8:22 UTC
20 points
2 comments2 min readLW link

Bridg­ing Ex­pected Utility Max­i­miza­tion and Optimization

Daniel Herrmann5 Aug 2022 8:18 UTC
25 points
5 comments14 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
41 points
12 comments31 min readLW link

Open Prob­lems in AIXI Agent Foundations

Cole Wyeth12 Sep 2024 15:38 UTC
42 points
2 comments10 min readLW link

A Gen­er­al­iza­tion of the Good Reg­u­la­tor Theorem

Alfred Harwood4 Jan 2025 9:55 UTC
20 points
6 comments10 min readLW link

In­fra-Bayesian haggling

hannagabor20 May 2024 12:23 UTC
28 points
0 comments20 min readLW link

Dis­cov­er­ing Agents

zac_kenton18 Aug 2022 17:33 UTC
73 points
11 comments6 min readLW link

Unal­igned AGI & Brief His­tory of Inequality

ank22 Feb 2025 16:26 UTC
−20 points
4 comments7 min readLW link

100 Din­ners And A Work­shop: In­for­ma­tion Preser­va­tion And Goals

Stephen Fowler28 Mar 2023 3:13 UTC
8 points
0 comments7 min readLW link

Half-baked idea: a straight­for­ward method for learn­ing en­vi­ron­men­tal goals?

Q Home4 Feb 2025 6:56 UTC
16 points
7 comments5 min readLW link

[Question] Choice := An­throp­ics un­cer­tainty? And po­ten­tial im­pli­ca­tions for agency

Antoine de Scorraille21 Apr 2022 16:38 UTC
6 points
1 comment1 min readLW link

7. Evolu­tion and Ethics

RogerDearnaley15 Feb 2024 23:38 UTC
3 points
7 comments6 min readLW link

Re­peated Play of Im­perfect New­comb’s Para­dox in In­fra-Bayesian Physicalism

Sven Nilsen3 Apr 2023 10:06 UTC
2 points
0 comments2 min readLW link

In­ter­view with Vanessa Kosoy on the Value of The­o­ret­i­cal Re­search for AI

WillPetillo4 Dec 2023 22:58 UTC
37 points
0 comments35 min readLW link

Clar­ify­ing “wis­dom”: Foun­da­tional top­ics for al­igned AIs to pri­ori­tize be­fore ir­re­versible decisions

Anthony DiGiovanni20 Jun 2025 21:55 UTC
37 points
2 comments12 min readLW link

Another take on agent foun­da­tions: for­mal­iz­ing zero-shot reasoning

zhukeepa1 Jul 2018 6:12 UTC
64 points
20 comments12 min readLW link

An Im­pos­si­bil­ity Proof Rele­vant to the Shut­down Prob­lem and Corrigibility

Audere2 May 2023 6:52 UTC
66 points
13 comments9 min readLW link

An In­tro­duc­tion to Re­in­force­ment Learn­ing for Un­der­stand­ing In­fra-Bayesianism

Brittany Gelb17 May 2025 2:34 UTC
13 points
0 comments20 min readLW link

A Straight­for­ward Ex­pla­na­tion of the Good Reg­u­la­tor Theorem

Alfred Harwood18 Nov 2024 12:45 UTC
81 points
29 comments14 min readLW link

Nor­ma­tive vs De­scrip­tive Models of Agency

mattmacdermott2 Feb 2023 20:28 UTC
26 points
5 comments4 min readLW link

What pro­gram struc­tures en­able effi­cient in­duc­tion?

Daniel C5 Sep 2024 10:12 UTC
23 points
5 comments3 min readLW link

In­tent-al­igned AI sys­tems de­plete hu­man agency: the need for agency foun­da­tions re­search in AI safety

catubc31 May 2023 21:18 UTC
26 points
4 comments11 min readLW link

Thou shalt not com­mand an al­ighned AI

Martin Vlach11 May 2025 20:02 UTC
0 points
4 comments1 min readLW link
No comments.