RSS

Agent Foundations

Tag

Why Agent Foun­da­tions? An Overly Ab­stract Explanation

johnswentworth25 Mar 2022 23:17 UTC
292 points
56 comments8 min readLW link1 review

Embed­ded Agency (full-text ver­sion)

15 Nov 2018 19:49 UTC
180 points
17 comments54 min readLW link

The Rocket Align­ment Problem

Eliezer Yudkowsky4 Oct 2018 0:38 UTC
215 points
41 comments15 min readLW link2 reviews

Un­der­stand­ing In­fra-Bayesi­anism: A Begin­ner-Friendly Video Series

22 Sep 2022 13:25 UTC
140 points
6 comments2 min readLW link

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
35 points
4 comments2 min readLW link

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
3 points
15 comments13 min readLW link

Orthog­o­nal: A new agent foun­da­tions al­ign­ment organization

Tamsin Leake19 Apr 2023 20:17 UTC
206 points
3 comments1 min readLW link
(orxl.org)

Clar­ify­ing the Agent-Like Struc­ture Problem

johnswentworth29 Sep 2022 21:28 UTC
58 points
15 comments6 min readLW link

Why Si­mu­la­tor AIs want to be Ac­tive In­fer­ence AIs

10 Apr 2023 18:23 UTC
82 points
8 comments8 min readLW link

You won’t solve al­ign­ment with­out agent foundations

Mikhail Samin6 Nov 2022 8:07 UTC
24 points
3 comments8 min readLW link

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermott15 May 2023 16:09 UTC
56 points
1 comment13 min readLW link

The Learn­ing-The­o­retic Agenda: Sta­tus 2023

Vanessa Kosoy19 Apr 2023 5:21 UTC
133 points
12 comments55 min readLW link

My take on agent foun­da­tions: for­mal­iz­ing metaphilo­soph­i­cal competence

zhukeepa1 Apr 2018 6:33 UTC
21 points
6 comments1 min readLW link

[Question] Cri­tiques of the Agent Foun­da­tions agenda?

Jsevillamol24 Nov 2020 16:11 UTC
16 points
3 comments1 min readLW link

an Evan­ge­lion di­alogue ex­plain­ing the QACI al­ign­ment plan

Tamsin Leake10 Jun 2023 3:28 UTC
45 points
15 comments42 min readLW link
(carado.moe)

for­mal­iz­ing the QACI al­ign­ment for­mal-goal

10 Jun 2023 3:28 UTC
53 points
6 comments14 min readLW link
(carado.moe)

Learn­ing-the­o­retic agenda read­ing list

Vanessa Kosoy9 Nov 2023 17:25 UTC
90 points
0 comments2 min readLW link

Game The­ory with­out Argmax [Part 1]

Cleo Nardo11 Nov 2023 15:59 UTC
53 points
16 comments19 min readLW link

Game The­ory with­out Argmax [Part 2]

Cleo Nardo11 Nov 2023 16:02 UTC
31 points
14 comments13 min readLW link

Public Call for In­ter­est in Math­e­mat­i­cal Alignment

Davidmanheim22 Nov 2023 13:22 UTC
89 points
9 comments1 min readLW link

What’s next for the field of Agent Foun­da­tions?

30 Nov 2023 17:55 UTC
59 points
21 comments10 min readLW link

Wild­fire of strategicness

TsviBT5 Jun 2023 13:59 UTC
36 points
19 comments1 min readLW link

Challenges with Break­ing into MIRI-Style Research

Chris_Leong17 Jan 2022 9:23 UTC
75 points
15 comments3 min readLW link

Refine­ment of Ac­tive In­fer­ence agency ontology

Roman Leventov15 Dec 2023 9:31 UTC
16 points
0 comments5 min readLW link
(arxiv.org)

Talk: “AI Would Be A Lot Less Alarm­ing If We Un­der­stood Agents”

johnswentworth17 Dec 2023 23:46 UTC
58 points
3 comments1 min readLW link
(www.youtube.com)

Mean­ing & Agency

abramdemski19 Dec 2023 22:27 UTC
90 points
17 comments14 min readLW link

Ideal­ized Agents Are Ap­prox­i­mate Causal Mir­rors (+ Rad­i­cal Op­ti­mism on Agent Foun­da­tions)

Thane Ruthenis22 Dec 2023 20:19 UTC
71 points
13 comments6 min readLW link

The Plan − 2023 Version

johnswentworth29 Dec 2023 23:34 UTC
142 points
39 comments31 min readLW link

A very non-tech­ni­cal ex­pla­na­tion of the ba­sics of in­fra-Bayesianism

matolcsid26 Apr 2023 22:57 UTC
62 points
9 comments9 min readLW link

Uncer­tainty in all its flavours

Cleo Nardo9 Jan 2024 16:21 UTC
25 points
6 comments35 min readLW link

In­ter­pret­ing Quan­tum Me­chan­ics in In­fra-Bayesian Physicalism

Yegreg12 Feb 2024 18:56 UTC
28 points
4 comments32 min readLW link

Con­se­quen­tial­ism is in the Stars not Ourselves

DragonGod24 Apr 2023 0:02 UTC
7 points
19 comments5 min readLW link

[Closed] Agent Foun­da­tions track in MATS

Vanessa Kosoy31 Oct 2023 8:12 UTC
54 points
1 comment1 min readLW link
(www.matsprogram.org)

Box in­ver­sion revisited

Jan_Kulveit7 Nov 2023 11:09 UTC
38 points
3 comments8 min readLW link

My re­search agenda in agent foundations

Alex_Altair28 Jun 2023 18:00 UTC
70 points
9 comments11 min readLW link

AXRP Epi­sode 25 - Co­op­er­a­tive AI with Cas­par Oesterheld

DanielFilan3 Oct 2023 21:50 UTC
43 points
0 comments92 min readLW link

(A → B) → A

Scott Garrabrant11 Sep 2018 22:38 UTC
70 points
11 comments2 min readLW link

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew_Critch19 Nov 2020 3:18 UTC
204 points
37 comments50 min readLW link2 reviews

AXRP Epi­sode 15 - Nat­u­ral Ab­strac­tions with John Wentworth

DanielFilan23 May 2022 5:40 UTC
34 points
1 comment58 min readLW link

[Question] Does agent foun­da­tions cover all fu­ture ML sys­tems?

Jonas Hallgren25 Jul 2022 1:17 UTC
2 points
0 comments1 min readLW link

[Closed] Prize and fast track to al­ign­ment re­search at ALTER

Vanessa Kosoy17 Sep 2022 16:58 UTC
63 points
6 comments3 min readLW link

Con­tra “Strong Co­her­ence”

DragonGod4 Mar 2023 20:05 UTC
39 points
24 comments1 min readLW link

Com­po­si­tional lan­guage for hy­pothe­ses about computations

Vanessa Kosoy11 Mar 2023 19:43 UTC
26 points
2 comments11 min readLW link

Fixed points in mor­tal pop­u­la­tion games

ViktoriaMalyasova14 Mar 2023 7:10 UTC
24 points
0 comments12 min readLW link
(www.lesswrong.com)

[Question] Choice := An­throp­ics un­cer­tainty? And po­ten­tial im­pli­ca­tions for agency

Antoine de Scorraille21 Apr 2022 16:38 UTC
6 points
1 comment1 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
20 points
6 comments31 min readLW link

Un­der­stand­ing Selec­tion Theorems

adamk28 May 2022 1:49 UTC
41 points
3 comments7 min readLW link

7. Evolu­tion and Ethics

RogerDearnaley15 Feb 2024 23:38 UTC
2 points
6 comments6 min readLW link

Bridg­ing Ex­pected Utility Max­i­miza­tion and Optimization

Whispermute5 Aug 2022 8:18 UTC
25 points
5 comments14 min readLW link

Dis­cov­er­ing Agents

zac_kenton18 Aug 2022 17:33 UTC
73 points
11 comments6 min readLW link

Three Types of Con­straints in the Space of Agents

15 Jan 2024 17:27 UTC
26 points
3 comments17 min readLW link

In­ter­view with Vanessa Kosoy on the Value of The­o­ret­i­cal Re­search for AI

WillPetillo4 Dec 2023 22:58 UTC
35 points
0 comments35 min readLW link

Shal­low re­view of live agen­das in al­ign­ment & safety

27 Nov 2023 11:10 UTC
288 points
69 comments29 min readLW link

Re­ward is not Ne­c­es­sary: How to Create a Com­po­si­tional Self-Pre­serv­ing Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC
17 points
2 comments2 min readLW link
(arxiv.org)

Goal al­ign­ment with­out al­ign­ment on episte­mol­ogy, ethics, and sci­ence is futile

Roman Leventov7 Apr 2023 8:22 UTC
20 points
2 comments2 min readLW link

Nor­ma­tive vs De­scrip­tive Models of Agency

mattmacdermott2 Feb 2023 20:28 UTC
26 points
5 comments4 min readLW link

In­tent-al­igned AI sys­tems de­plete hu­man agency: the need for agency foun­da­tions re­search in AI safety

catubc31 May 2023 21:18 UTC
24 points
4 comments11 min readLW link

Towards Mea­sures of Optimisation

12 May 2023 15:29 UTC
53 points
37 comments4 min readLW link

An Im­pos­si­bil­ity Proof Rele­vant to the Shut­down Prob­lem and Corrigibility

Audere2 May 2023 6:52 UTC
65 points
13 comments9 min readLW link

In­fra-Bayesi­anism nat­u­rally leads to the mono­ton­ic­ity prin­ci­ple, and I think this is a problem

matolcsid26 Apr 2023 21:39 UTC
16 points
6 comments4 min readLW link

Gear­ing Up for Long Timelines in a Hard World

Dalcy14 Jul 2023 6:11 UTC
11 points
0 comments4 min readLW link

Op­ti­mi­sa­tion Mea­sures: Desider­ata, Im­pos­si­bil­ity, Proposals

7 Aug 2023 15:52 UTC
35 points
9 comments1 min readLW link

A mostly crit­i­cal re­view of in­fra-Bayesianism

matolcsid28 Feb 2023 18:37 UTC
104 points
9 comments29 min readLW link

Perfor­mance guaran­tees in clas­si­cal learn­ing the­ory and in­fra-Bayesianism

matolcsid28 Feb 2023 18:37 UTC
9 points
4 comments31 min readLW link

Re­peated Play of Im­perfect New­comb’s Para­dox in In­fra-Bayesian Physicalism

Sven Nilsen3 Apr 2023 10:06 UTC
2 points
0 comments2 min readLW link

Another take on agent foun­da­tions: for­mal­iz­ing zero-shot reasoning

zhukeepa1 Jul 2018 6:12 UTC
60 points
20 comments12 min readLW link

Ar­gu­ments about Highly Reli­able Agent De­signs as a Use­ful Path to Ar­tifi­cial In­tel­li­gence Safety

27 Jan 2022 13:13 UTC
27 points
0 comments1 min readLW link
(arxiv.org)

100 Din­ners And A Work­shop: In­for­ma­tion Preser­va­tion And Goals

Stephen Fowler28 Mar 2023 3:13 UTC
8 points
0 comments7 min readLW link
No comments.