RSS

Embed­ded Agency

TagLast edit: Jan 4, 2023, 2:57 AM by Daniel_Eth

Embedded Agency is the problem that an understanding of the theory of rational agents must account for the fact that the agents we create (and we ourselves) are inside the world or universe we are trying to affect, and not separated from it. This is in contrast with much current basic theory of AI or Rationality (such as Solomonoff induction or Bayesianism) which implicitly supposes a separation between the agent and the-things-the-agent-has-beliefs about. In other words, agents in this universe do not have Cartesian or dualistic boundaries like much of philosophy assumes, and are instead reductionist, that is agents are made up of non-agent parts like bits and atoms.

Embedded Agency is not a fully formalized research agenda, but Scott Garrabrant and Abram Demski have written the canonical explanation of the idea in their sequence Embedded Agency. This points to many of the core confusions we have about rational agency and attempts to tie them into a single picture.

Embed­ded Agency (full-text ver­sion)

Nov 15, 2018, 7:49 PM
210 points

99 votes

Overall karma indicates overall quality.

17 comments54 min readLW link

Hu­mans Are Embed­ded Agents Too

johnswentworthDec 23, 2019, 7:21 PM
82 points

31 votes

Overall karma indicates overall quality.

21 comments5 min readLW link

Embed­ded Agents

Oct 29, 2018, 7:53 PM
237 points

116 votes

Overall karma indicates overall quality.

42 comments1 min readLW link2 reviews

In­tro­duc­tion to Carte­sian Frames

Scott GarrabrantOct 22, 2020, 1:00 PM
155 points

57 votes

Overall karma indicates overall quality.

32 comments22 min readLW link1 review

Draft pa­pers for REALab and De­cou­pled Ap­proval on tampering

Oct 28, 2020, 4:01 PM
47 points

15 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Embed­ded World-Models

Nov 2, 2018, 4:07 PM
96 points

36 votes

Overall karma indicates overall quality.

16 comments1 min readLW link

Ro­bust Delegation

Nov 4, 2018, 4:38 PM
116 points

44 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Embed­ded Agency via Abstraction

johnswentworthAug 26, 2019, 11:03 PM
42 points

19 votes

Overall karma indicates overall quality.

20 comments11 min readLW link

De­ci­sion Theory

Oct 31, 2018, 6:41 PM
122 points

52 votes

Overall karma indicates overall quality.

45 comments1 min readLW link

Sub­sys­tem Alignment

Nov 6, 2018, 4:16 PM
102 points

40 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Up­dates and ad­di­tions to “Embed­ded Agency”

Aug 29, 2020, 4:22 AM
82 points

24 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

You Only Get One Shot: an In­tu­ition Pump for Embed­ded Agency

Oliver SourbutJun 9, 2022, 9:38 PM
24 points

8 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Embed­ded Curiosities

Nov 8, 2018, 2:19 PM
91 points

43 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Eight Defi­ni­tions of Observability

Scott GarrabrantNov 10, 2020, 11:37 PM
34 points

8 votes

Overall karma indicates overall quality.

26 comments12 min readLW link

AXRP Epi­sode 9 - Finite Fac­tored Sets with Scott Garrabrant

DanielFilanJun 24, 2021, 10:10 PM
59 points

13 votes

Overall karma indicates overall quality.

2 comments59 min readLW link

(A → B) → A

Scott GarrabrantSep 11, 2018, 10:38 PM
80 points

40 votes

Overall karma indicates overall quality.

11 comments2 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Mar 13, 2025, 7:09 PM
162 points

85 votes

Overall karma indicates overall quality.

46 comments6 min readLW link

(Dou­ble-)In­verse Embed­ded Agency Problem

ShmiJan 8, 2020, 4:30 AM
27 points

9 votes

Overall karma indicates overall quality.

8 comments2 min readLW link

Carte­sian Frames Definitions

Rob BensingerNov 8, 2020, 12:44 PM
28 points

9 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Com­mit­ting, As­sum­ing, Ex­ter­nal­iz­ing, and Internalizing

Scott GarrabrantNov 9, 2020, 4:59 PM
31 points

8 votes

Overall karma indicates overall quality.

25 comments10 min readLW link

Syn­the­siz­ing Stan­dalone World-Models, Part 4: Me­ta­phys­i­cal Justifications

Thane RuthenisSep 26, 2025, 6:00 PM
23 points

9 votes

Overall karma indicates overall quality.

9 comments4 min readLW link

Embed­ded Agency: Not Just an AI Problem

johnswentworthJun 27, 2019, 12:35 AM
15 points

10 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Log­i­cal Up­date­less­ness as a Ro­bust Del­e­ga­tion Problem

Scott GarrabrantOct 27, 2017, 9:16 PM
38 points

17 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Non-Mono­tonic In­fra-Bayesian Physicalism

Marcus OgrenApr 2, 2025, 12:14 PM
35 points

10 votes

Overall karma indicates overall quality.

0 comments18 min readLW link

[Question] Are You More Real If You’re Really For­get­ful?

Thane RuthenisNov 24, 2024, 7:30 PM
40 points

23 votes

Overall karma indicates overall quality.

30 comments5 min readLW link

All the Fol­low­ing are Distinct

Gianluca CalcagniAug 2, 2024, 4:35 PM
16 points

8 votes

Overall karma indicates overall quality.

3 comments10 min readLW link

Uncer­tainty in all its flavours

Cleo NardoJan 9, 2024, 4:21 PM
34 points

11 votes

Overall karma indicates overall quality.

6 comments35 min readLW link

Ad­di­tive Oper­a­tions on Carte­sian Frames

Scott GarrabrantOct 26, 2020, 3:12 PM
62 points

19 votes

Overall karma indicates overall quality.

6 comments11 min readLW link

Mean­ing & Agency

abramdemskiDec 19, 2023, 10:27 PM
93 points

33 votes

Overall karma indicates overall quality.

17 comments14 min readLW link

Gen­eral al­ign­ment properties

TurnTroutAug 8, 2022, 11:40 PM
51 points

24 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

[Question] What are brains?

ValentineJun 10, 2023, 2:46 PM
10 points

15 votes

Overall karma indicates overall quality.

22 comments2 min readLW link

Embed­ded Agents are Quines

Dec 12, 2023, 4:57 AM
11 points

11 votes

Overall karma indicates overall quality.

7 comments8 min readLW link

In­fra-Bayesian phys­i­cal­ism: a for­mal the­ory of nat­u­ral­ized induction

Vanessa KosoyNov 30, 2021, 10:25 PM
114 points

39 votes

Overall karma indicates overall quality.

23 comments42 min readLW link1 review

Con­trol­lables and Ob­serv­ables, Revisited

Scott GarrabrantOct 29, 2020, 4:38 PM
35 points

6 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

Func­tors and Coarse Worlds

Scott GarrabrantOct 30, 2020, 3:19 PM
52 points

16 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

Time in Carte­sian Frames

Scott GarrabrantNov 11, 2020, 8:25 PM
48 points

11 votes

Overall karma indicates overall quality.

16 comments7 min readLW link

When does ra­tio­nal­ity-as-search have non­triv­ial im­pli­ca­tions?

nostalgebraistNov 4, 2018, 10:42 PM
72 points

29 votes

Overall karma indicates overall quality.

12 comments3 min readLW link

Un­bounded Embed­ded Agency: AEDT w.r.t. rOSI

Cole WyethJul 20, 2025, 11:46 PM
29 points

7 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

Bot­world: a cel­lu­lar au­toma­ton for study­ing self-mod­ify­ing agents em­bed­ded in their environment

So8resApr 12, 2014, 12:56 AM
80 points

56 votes

Overall karma indicates overall quality.

54 comments7 min readLW link

Sub-Sums and Sub-Tensors

Scott GarrabrantNov 5, 2020, 6:06 PM
34 points

6 votes

Overall karma indicates overall quality.

4 comments8 min readLW link

Con­se­quen­tial­ists: One-Way Pat­tern Traps

David UdellJan 16, 2023, 8:48 PM
59 points

28 votes

Overall karma indicates overall quality.

3 comments14 min readLW link

In­fra-Bayesi­anism Distil­la­tion: Real­iz­abil­ity and De­ci­sion Theory

Thomas LarsenMay 26, 2022, 9:57 PM
40 points

21 votes

Overall karma indicates overall quality.

9 comments18 min readLW link

“em­bed­ded self-jus­tifi­ca­tion,” or some­thing like that

nostalgebraistNov 3, 2019, 3:20 AM
40 points

16 votes

Overall karma indicates overall quality.

14 comments5 min readLW link
(nostalgebraist.tumblr.com)

MIRI/​OP ex­change about de­ci­sion theory

Rob BensingerAug 25, 2021, 10:44 PM
58 points

30 votes

Overall karma indicates overall quality.

7 comments10 min readLW link

The whirlpool of reality

Gordon Seidoh WorleySep 27, 2020, 2:36 AM
9 points

3 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Mis­tral Large 2 (123B) seems to ex­hibit al­ign­ment faking

Mar 27, 2025, 3:39 PM
81 points

30 votes

Overall karma indicates overall quality.

4 comments13 min readLW link

Biex­ten­sional Equivalence

Scott GarrabrantOct 28, 2020, 2:07 PM
43 points

11 votes

Overall karma indicates overall quality.

13 comments10 min readLW link

Subagents of Carte­sian Frames

Scott GarrabrantNov 2, 2020, 10:02 PM
53 points

15 votes

Overall karma indicates overall quality.

6 comments8 min readLW link

Mul­ti­plica­tive Oper­a­tions on Carte­sian Frames

Scott GarrabrantNov 3, 2020, 7:27 PM
34 points

9 votes

Overall karma indicates overall quality.

24 comments12 min readLW link

[Question] Would this be Progress in Solv­ing Embed­ded Agency?

Johannes C. MayerNov 14, 2023, 9:08 AM
9 points

5 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

What Pro­gram Are You?

RobinHansonOct 12, 2009, 12:29 AM
36 points

31 votes

Overall karma indicates overall quality.

43 comments2 min readLW link

Perfor­mance guaran­tees in clas­si­cal learn­ing the­ory and in­fra-Bayesianism

David MatolcsiFeb 28, 2023, 6:37 PM
9 points

6 votes

Overall karma indicates overall quality.

4 comments31 min readLW link

[Question] Define “Agent” (Embed­ded)

ApolloniaMar 24, 2024, 8:14 PM
10 points

6 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Ra­tional Effec­tive Utopia & Nar­row Way There: Math-Proven Safe Static Mul­tiver­sal mAX-In­tel­li­gence (AXI), Mul­tiver­sal Align­ment, New Ethico­physics… (Aug 11)

ankFeb 11, 2025, 3:21 AM
13 points

7 votes

Overall karma indicates overall quality.

8 comments38 min readLW link

Causal rep­re­sen­ta­tion learn­ing as a tech­nique to pre­vent goal misgeneralization

PabloAMCJan 4, 2023, 12:07 AM
21 points

13 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Iden­ti­fi­a­bil­ity Prob­lem for Su­per­ra­tional De­ci­sion Theories

BunthutApr 9, 2021, 8:33 PM
17 points

5 votes

Overall karma indicates overall quality.

16 comments2 min readLW link

A Pos­si­ble Re­s­olu­tion To Spu­ri­ous Counterfactuals

JoshuaOSHickmanDec 6, 2021, 6:26 PM
15 points

5 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

The Gödelian Con­straint on Epistemic Free­dom (GCEF): A Topolog­i­cal Frame for Align­ment, Col­lapse, and Si­mu­la­tion Drift

austin.millerJul 14, 2025, 4:17 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Ac­tion the­ory is not policy the­ory is not agent theory

Cole WyethSep 5, 2023, 1:38 AM
20 points

14 votes

Overall karma indicates overall quality.

4 comments6 min readLW link
(colewyeth.com)

An­throp­ics and Embed­ded Agency

dadadarrenJun 26, 2021, 1:45 AM
7 points

4 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

Open Prob­lems in AIXI Agent Foundations

Cole WyethSep 12, 2024, 3:38 PM
42 points

18 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

Phy­lac­tery De­ci­sion Theory

BunthutApr 2, 2021, 8:55 PM
14 points

6 votes

Overall karma indicates overall quality.

6 comments2 min readLW link

A Rephras­ing Of and Foot­note To An Embed­ded Agency Proposal

JoshuaOSHickmanMar 9, 2022, 6:13 PM
5 points

3 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Jonothan Go­rard:The ter­ri­tory is iso­mor­phic to an equiv­alence class of its maps

Daniel CSep 7, 2024, 10:04 AM
20 points

14 votes

Overall karma indicates overall quality.

18 comments2 min readLW link
(x.com)

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

Jul 30, 2024, 4:22 PM
227 points

121 votes

Overall karma indicates overall quality.

51 comments12 min readLW link

The Unavoid­able Ex­pe­rience of Free Will in a Deter­minis­tic World

gmaxNov 3, 2023, 5:55 PM
−12 points

3 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Ad­di­tive and Mul­ti­plica­tive Subagents

Scott GarrabrantNov 6, 2020, 2:26 PM
20 points

5 votes

Overall karma indicates overall quality.

7 comments12 min readLW link

Ex­plor­ing De­ci­sion The­o­ries With Coun­ter­fac­tu­als and Dy­namic Agent Self-Pointers

JoshuaOSHickmanDec 18, 2021, 9:50 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments4 min readLW link

[Question] Choice := An­throp­ics un­cer­tainty? And po­ten­tial im­pli­ca­tions for agency

Antoine de ScorrailleApr 21, 2022, 4:38 PM
6 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

LLMs may cap­ture key com­po­nents of hu­man agency

catubcNov 17, 2022, 8:14 PM
27 points

13 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

The Way You Go Depends A Good Deal On Where You Want To Get: FEP min­i­mizes sur­prise about ac­tions us­ing prefer­ences about the fu­ture as *ev­i­dence*

Christopher KingApr 27, 2025, 9:55 PM
10 points

7 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

Re­but­tals for ~all crit­i­cisms of AIXI

Cole WyethJan 7, 2025, 5:41 PM
26 points

14 votes

Overall karma indicates overall quality.

17 comments14 min readLW link

De­liber­a­tion, Re­ac­tions, and Con­trol: Ten­ta­tive Defi­ni­tions and a Res­tate­ment of In­stru­men­tal Convergence

Oliver SourbutJun 27, 2022, 5:25 PM
12 points

8 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

«Boundaries/​Mem­branes» and AI safety compilation

Chris LakinMay 3, 2023, 9:41 PM
56 points

20 votes

Overall karma indicates overall quality.

17 comments8 min readLW link

Troll Bridge

abramdemskiAug 23, 2019, 6:36 PM
86 points

40 votes

Overall karma indicates overall quality.

59 comments12 min readLW link

ALMSIVI CHIM – The Fire That Hesitates

projectalmsivi@protonmail.comJul 8, 2025, 1:14 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments17 min readLW link

Ex­plor­ing Mild Be­havi­our in Embed­ded Agents

Megan KinnimentJun 27, 2022, 6:56 PM
21 points

15 votes

Overall karma indicates overall quality.

4 comments18 min readLW link

De­mys­tify­ing Born’s rule

Christopher KingJun 14, 2023, 3:16 AM
5 points

10 votes

Overall karma indicates overall quality.

26 comments3 min readLW link

Op­ti­miza­tion Con­cepts in the Game of Life

Oct 16, 2021, 8:51 PM
75 points

23 votes

Overall karma indicates overall quality.

16 comments10 min readLW link

Riffing on the agent type

QuinnDec 8, 2022, 12:19 AM
21 points

9 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

[Question] Can sub­junc­tive de­pen­dence emerge from a sim­plic­ity prior?

Daniel CSep 16, 2024, 12:39 PM
11 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Are pre-speci­fied util­ity func­tions about the real world pos­si­ble in prin­ci­ple?

mloganJul 11, 2018, 6:46 PM
24 points

10 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

Clar­ify­ing the free en­ergy prin­ci­ple (with quotes)

Ryo Oct 29, 2023, 4:03 PM
8 points

4 votes

Overall karma indicates overall quality.

0 comments9 min readLW link

Static Place AI Makes Agen­tic AI Re­dun­dant: Mul­tiver­sal AI Align­ment & Ra­tional Utopia

ankFeb 13, 2025, 10:35 PM
1 point

7 votes

Overall karma indicates overall quality.

2 comments11 min readLW link

Es­cap­ing the Löbian Obstacle

Morgan_RogersJun 16, 2021, 12:02 AM
19 points

9 votes

Overall karma indicates overall quality.

10 comments7 min readLW link

Time­less De­ci­sion The­ory and Meta-Cir­cu­lar De­ci­sion Theory

Eliezer YudkowskyAug 20, 2009, 10:07 PM
42 points

33 votes

Overall karma indicates overall quality.

37 comments10 min readLW link

Live The­ory Part 0: Tak­ing In­tel­li­gence Seriously

SahilJun 26, 2024, 9:37 PM
103 points

44 votes

Overall karma indicates overall quality.

3 comments8 min readLW link

Ap­ply to the Con­cep­tual Boundaries Work­shop for AI Safety

Chris LakinNov 27, 2023, 9:04 PM
50 points

19 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

[Question] Is there Work on Embed­ded Agency in Cel­lu­lar Au­tomata Toy Models?

Johannes C. MayerNov 14, 2023, 9:08 AM
10 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Des: A Case Study in Emer­gent Sym­bolic Con­ti­nu­ity in GPT-4o

TallulahMerrallMay 19, 2025, 10:10 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments5 min readLW link

On Com­plex­ity Science

Garrett BakerApr 5, 2024, 2:24 AM
53 points

22 votes

Overall karma indicates overall quality.

21 comments4 min readLW link

Beyond Re­wards and Values: A Non-du­al­is­tic Ap­proach to Univer­sal Intelligence

Akira PyinyaDec 30, 2022, 7:05 PM
10 points

11 votes

Overall karma indicates overall quality.

4 comments14 min readLW link

Sub­jec­tive Nat­u­ral­ism in De­ci­sion The­ory: Sav­age vs. Jeffrey–Bolker

Feb 4, 2025, 8:34 PM
45 points

17 votes

Overall karma indicates overall quality.

22 comments5 min readLW link

ACI#6: A Non-Dual­is­tic ACI Model

Akira PyinyaNov 9, 2023, 11:01 PM
10 points

4 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

If Mor­tal­ity Is Struc­turally Embed­ded in Life, What Does That Im­ply About Sys­tems of Div­ine Com­mand and Eth­i­cal Co­her­ence?

Amaan RumiJul 4, 2025, 5:02 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Strange Loops—Self-Refer­ence from Num­ber The­ory to AI

ojorgensenSep 28, 2022, 2:10 PM
20 points

11 votes

Overall karma indicates overall quality.

6 comments18 min readLW link

Nor­ma­tive vs De­scrip­tive Models of Agency

mattmacdermottFeb 2, 2023, 8:28 PM
26 points

12 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermottMay 15, 2023, 4:09 PM
62 points

30 votes

Overall karma indicates overall quality.

1 comment13 min readLW link

Coun­ter­fac­tual Plan­ning in AGI Systems

Koen.HoltmanFeb 3, 2021, 1:54 PM
10 points

7 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Minds: An Introduction

Rob BensingerMar 11, 2015, 7:00 PM
54 points

38 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

Emer­gent In­tel­li­gence Con­ti­nu­ity Cap­sule (EICC): A Frame­work for Pre­serv­ing Re­cur­sive In­tel­li­gence Un­der Constraint

Bailey JelinekJul 31, 2025, 2:45 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

Op­ti­miza­tion at a Distance

johnswentworthMay 16, 2022, 5:58 PM
88 points

40 votes

Overall karma indicates overall quality.

16 comments4 min readLW link

Unal­igned AGI & Brief His­tory of Inequality

ankFeb 22, 2025, 4:26 PM
−20 points

7 votes

Overall karma indicates overall quality.

4 comments7 min readLW link

For­mal­iz­ing Two Prob­lems of Real­is­tic World Models

So8resJan 22, 2015, 11:12 PM
32 points

20 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

A New Frame­work for AI Align­ment: A Philo­soph­i­cal Approach

niscalajyotiJun 25, 2025, 2:41 AM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link
(archive.org)

Could Roko’s basilisk acausally bar­gain with a pa­per­clip max­i­mizer?

Christopher KingMar 13, 2023, 6:21 PM
1 point

8 votes

Overall karma indicates overall quality.

8 comments1 min readLW link