Agent Foundations

Tag

Why Agent Foundations? An Overly Abstract Explanation

johnswentworth25 Mar 2022 23:17 UTC

296 points

56 comments8 min readLW link 1 review

Embedded Agency (full-text version)

Scott Garrabrant and abramdemski

15 Nov 2018 19:49 UTC

184 points

17 comments54 min readLW link

The Rocket Alignment Problem

Eliezer Yudkowsky4 Oct 2018 0:38 UTC

217 points

41 comments15 min readLW link 2 reviews

Understanding Infra-Bayesianism: A Beginner-Friendly Video Series

Jack Parker and Connall Garrod

22 Sep 2022 13:25 UTC

140 points

6 comments2 min readLW link

Orthogonal: A new agent foundations alignment organization

Tamsin Leake19 Apr 2023 20:17 UTC

207 points

4 comments1 min readLW link

(orxl.org)

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaley1 Feb 2024 21:15 UTC

13 points

15 comments13 min readLW link

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC

36 points

4 comments2 min readLW link

0th Person and 1st Person Logic

Adele Lopez10 Mar 2024 0:56 UTC

55 points

28 comments6 min readLW link

Some Summaries of Agent Foundations Work

mattmacdermott15 May 2023 16:09 UTC

58 points

1 comment13 min readLW link

You won’t solve alignment without agent foundations

Mikhail Samin6 Nov 2022 8:07 UTC

24 points

3 comments8 min readLW link

Clarifying the Agent-Like Structure Problem

johnswentworth29 Sep 2022 21:28 UTC

59 points

15 comments6 min readLW link

Why Simulator AIs want to be Active Inference AIs

Jan_Kulveit and rosehadshar

10 Apr 2023 18:23 UTC

90 points

8 comments8 min readLW link

an Evangelion dialogue explaining the QACI alignment plan

Tamsin Leake10 Jun 2023 3:28 UTC

50 points

15 comments43 min readLW link

(carado.moe)

formalizing the QACI alignment formal-goal

Tamsin Leake and JuliaHP

10 Jun 2023 3:28 UTC

53 points

6 comments14 min readLW link

(carado.moe)

My take on agent foundations: formalizing metaphilosophical competence

zhukeepa1 Apr 2018 6:33 UTC

21 points

6 comments1 min readLW link

[Question] Critiques of the Agent Foundations agenda?

Jsevillamol24 Nov 2020 16:11 UTC

16 points

3 comments1 min readLW link

The Learning-Theoretic Agenda: Status 2023

Vanessa Kosoy19 Apr 2023 5:21 UTC

135 points

13 comments55 min readLW link

Time complexity for deterministic string machines

alcatal21 Apr 2024 22:35 UTC

21 points

0 comments21 min readLW link

Fixed points in mortal population games

ViktoriaMalyasova14 Mar 2023 7:10 UTC

31 points

0 comments12 min readLW link

(www.lesswrong.com)

Consequentialism is in the Stars not Ourselves

DragonGod24 Apr 2023 0:02 UTC

7 points

19 comments5 min readLW link

[Closed] Agent Foundations track in MATS

Vanessa Kosoy31 Oct 2023 8:12 UTC

54 points

1 comment1 min readLW link

(www.matsprogram.org)

Meaning & Agency

abramdemski19 Dec 2023 22:27 UTC

91 points

17 comments14 min readLW link

Learning-theoretic agenda reading list

Vanessa Kosoy9 Nov 2023 17:25 UTC

98 points

0 comments2 min readLW link

Game Theory without Argmax [Part 1]

Cleo Nardo11 Nov 2023 15:59 UTC

64 points

17 comments19 min readLW link

Game Theory without Argmax [Part 2]

Cleo Nardo11 Nov 2023 16:02 UTC

31 points

14 comments13 min readLW link

Public Call for Interest in Mathematical Alignment

Davidmanheim22 Nov 2023 13:22 UTC

89 points

9 comments1 min readLW link

Refinement of Active Inference agency ontology

Roman Leventov15 Dec 2023 9:31 UTC

16 points

0 comments5 min readLW link

(arxiv.org)

What’s next for the field of Agent Foundations?

Nora_Ammann, Alexander Gietelink Oldenziel and mattmacdermott

30 Nov 2023 17:55 UTC

59 points

23 comments10 min readLW link

Uncertainty in all its flavours

Cleo Nardo9 Jan 2024 16:21 UTC

27 points

6 comments35 min readLW link

Talk: “AI Would Be A Lot Less Alarming If We Understood Agents”

johnswentworth17 Dec 2023 23:46 UTC

58 points

3 comments1 min readLW link

(www.youtube.com)

Wildfire of strategicness

TsviBT5 Jun 2023 13:59 UTC

38 points

19 comments1 min readLW link

My research agenda in agent foundations

Alex_Altair28 Jun 2023 18:00 UTC

70 points

9 comments11 min readLW link

Interpreting Quantum Mechanics in Infra-Bayesian Physicalism

Yegreg12 Feb 2024 18:56 UTC

30 points

6 comments32 min readLW link

Coherence of Caches and Agents

johnswentworth1 Apr 2024 23:04 UTC

74 points

7 comments11 min readLW link

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld

DanielFilan3 Oct 2023 21:50 UTC

43 points

0 comments92 min readLW link

Challenges with Breaking into MIRI-Style Research

Chris_Leong17 Jan 2022 9:23 UTC

75 points

15 comments3 min readLW link

Box inversion revisited

Jan_Kulveit7 Nov 2023 11:09 UTC

41 points

3 comments8 min readLW link

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)

Diffractor18 Apr 2024 8:39 UTC

33 points

2 comments19 min readLW link

Some AI research areas and their relevance to existential safety

Andrew_Critch19 Nov 2020 3:18 UTC

204 points

37 comments50 min readLW link 2 reviews

Towards a formalization of the agent structure problem

Alex_Altair29 Apr 2024 20:28 UTC

52 points

5 comments14 min readLW link

AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilan23 May 2022 5:40 UTC

34 points

1 comment58 min readLW link

Linear infra-Bayesian Bandits

Vanessa Kosoy10 May 2024 6:41 UTC

39 points

5 comments1 min readLW link

(arxiv.org)

[Question] Does agent foundations cover all future ML systems?

Jonas Hallgren25 Jul 2022 1:17 UTC

2 points

0 comments1 min readLW link

Empirical vs. Mathematical Joints of Nature

Elizabeth and Alex_Altair

26 Jun 2024 1:55 UTC

35 points

1 comment5 min readLW link

Live Theory Part 0: Taking Intelligence Seriously

Sahil26 Jun 2024 21:37 UTC

76 points

3 comments12 min readLW link

[Closed] Prize and fast track to alignment research at ALTER

Vanessa Kosoy17 Sep 2022 16:58 UTC

63 points

8 comments3 min readLW link

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)

Thane Ruthenis22 Dec 2023 20:19 UTC

71 points

14 comments6 min readLW link

The Plan − 2023 Version

johnswentworth29 Dec 2023 23:34 UTC

146 points

39 comments31 min readLW link

A very non-technical explanation of the basics of infra-Bayesianism

matolcsid26 Apr 2023 22:57 UTC

62 points

9 comments9 min readLW link

(A → B) → A

Scott Garrabrant11 Sep 2018 22:38 UTC

70 points

11 comments2 min readLW link

Contra “Strong Coherence”

DragonGod4 Mar 2023 20:05 UTC

39 points

24 comments1 min readLW link

Compositional language for hypotheses about computations

Vanessa Kosoy11 Mar 2023 19:43 UTC

37 points

2 comments11 min readLW link

A mostly critical review of infra-Bayesianism

matolcsid28 Feb 2023 18:37 UTC

104 points

9 comments29 min readLW link

Three Types of Constraints in the Space of Agents

Nora_Ammann and Mateusz Bagiński

15 Jan 2024 17:27 UTC

26 points

3 comments17 min readLW link

7. Evolution and Ethics

RogerDearnaley15 Feb 2024 23:38 UTC

3 points

6 comments6 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC

38 points

6 comments31 min readLW link

Infra-Bayesian haggling

hannagabor20 May 2024 12:23 UTC

18 points

0 comments20 min readLW link

100 Dinners And A Workshop: Information Preservation And Goals

Stephen Fowler28 Mar 2023 3:13 UTC

8 points

0 comments7 min readLW link

Repeated Play of Imperfect Newcomb’s Paradox in Infra-Bayesian Physicalism

Sven Nilsen3 Apr 2023 10:06 UTC

2 points

0 comments2 min readLW link

Goal alignment without alignment on epistemology, ethics, and science is futile

Roman Leventov7 Apr 2023 8:22 UTC

20 points

2 comments2 min readLW link

Infra-Bayesianism naturally leads to the monotonicity principle, and I think this is a problem

matolcsid26 Apr 2023 21:39 UTC

17 points

6 comments4 min readLW link

Shallow review of live agendas in alignment & safety

technicalities and Stag

27 Nov 2023 11:10 UTC

318 points

69 comments29 min readLW link

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI

WillPetillo4 Dec 2023 22:58 UTC

36 points

0 comments35 min readLW link

An Impossibility Proof Relevant to the Shutdown Problem and Corrigibility

Audere2 May 2023 6:52 UTC

65 points

13 comments9 min readLW link

Towards Measures of Optimisation

mattmacdermott and Alexander Gietelink Oldenziel

12 May 2023 15:29 UTC

53 points

37 comments4 min readLW link

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

catubc31 May 2023 21:18 UTC

24 points

4 comments11 min readLW link

Gearing Up for Long Timelines in a Hard World

Dalcy14 Jul 2023 6:11 UTC

13 points

0 comments4 min readLW link

Optimisation Measures: Desiderata, Impossibility, Proposals

mattmacdermott and Alexander Gietelink Oldenziel

7 Aug 2023 15:52 UTC

35 points

9 comments1 min readLW link

Another take on agent foundations: formalizing zero-shot reasoning

zhukeepa1 Jul 2018 6:12 UTC

60 points

20 comments12 min readLW link

Arguments about Highly Reliable Agent Designs as a Useful Path to Artificial Intelligence Safety

riceissa and Davidmanheim

27 Jan 2022 13:13 UTC

27 points

0 comments1 min readLW link

(arxiv.org)

[Question] Choice := Anthropics uncertainty? And potential implications for agency

Antoine de Scorraille21 Apr 2022 16:38 UTC

6 points

1 comment1 min readLW link

Understanding Selection Theorems

adamk28 May 2022 1:49 UTC

41 points

3 comments7 min readLW link

Bridging Expected Utility Maximization and Optimization

Whispermute5 Aug 2022 8:18 UTC

25 points

5 comments14 min readLW link

Discovering Agents

zac_kenton18 Aug 2022 17:33 UTC

73 points

11 comments6 min readLW link

Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC

17 points

2 comments2 min readLW link

(arxiv.org)

Normative vs Descriptive Models of Agency

mattmacdermott2 Feb 2023 20:28 UTC

26 points

5 comments4 min readLW link

Performance guarantees in classical learning theory and infra-Bayesianism

matolcsid28 Feb 2023 18:37 UTC

9 points

4 comments31 min readLW link

No comments.