

Au­dit­ing LMs with coun­ter­fac­tual search: a tool for con­trol and ELK

Jacob Pfau20 Feb 2024 0:02 UTC
28 points
6 comments10 min readLW link

Some thoughts on “The Na­ture of Coun­ter­fac­tu­als”

tailcalled16 Jan 2022 18:12 UTC
20 points
11 comments11 min readLW link

Cir­cu­lar Coun­ter­fac­tu­als “Only that which Hap­pens is Pos­si­ble”

JohnBuridan23 Mar 2022 14:40 UTC
4 points
15 comments9 min readLW link

Re­sults: Cir­cu­lar Depen­dency of Coun­ter­fac­tu­als Prize

Chris_Leong5 Apr 2022 6:29 UTC
19 points
0 comments1 min readLW link

Four fac­tors that mod­er­ate the in­ten­sity of emotions

Ruby24 Nov 2018 20:40 UTC
59 points
11 comments8 min readLW link

Get­ting Un­stuck on Counterfactuals

Chris_Leong20 Jul 2022 5:31 UTC
7 points
1 comment2 min readLW link

The Many Faces of In­fra-Beliefs

Diffractor6 Apr 2021 10:43 UTC
30 points
6 comments63 min readLW link

Coun­ter­fac­tu­als are Con­fus­ing be­cause of an On­tolog­i­cal Shift

Chris_Leong5 Aug 2022 19:03 UTC
17 points
35 comments2 min readLW link

My Cur­rent Take on Counterfactuals

abramdemski9 Apr 2021 17:51 UTC
53 points
57 comments25 min readLW link

The Na­ture of Counterfactuals

Chris_Leong5 Jun 2021 9:18 UTC
15 points
18 comments4 min readLW link

Agency and the un­re­li­able au­tonomous car

Alex Flint7 Jul 2021 14:58 UTC
29 points
24 comments10 min readLW link

Coun­ter­fac­tual Contracts

harsimony16 Sep 2021 15:20 UTC
10 points
4 comments9 min readLW link

Ap­ply­ing the Coun­ter­fac­tual Pri­soner’s Dilemma to Log­i­cal Uncertainty

Chris_Leong16 Sep 2020 10:34 UTC
9 points
5 comments2 min readLW link

QACI: ques­tion-an­swer coun­ter­fac­tual intervals

Tamsin Leake24 Oct 2022 13:08 UTC
22 points
0 comments4 min readLW link

De­ci­sions: On­tolog­i­cally Shift­ing to Determinism

Chris_Leong21 Dec 2022 12:41 UTC
8 points
11 comments6 min readLW link

Causal­ity and de­ter­minism in so­cial sci­ence—An in­ves­ti­ga­tion us­ing Pearl’s causal ladder

tailcalled3 Jan 2022 17:51 UTC
12 points
10 comments9 min readLW link

Coun­ter­fac­tu­als from en­sem­bles of peers

David Johnston4 Jan 2022 7:01 UTC
3 points
4 comments7 min readLW link

Coun­ter­fac­tu­als for Perfect Predictors

Chris_Leong6 Aug 2018 12:24 UTC
12 points
17 comments6 min readLW link

Coun­ter­fac­tual Induction

Diffractor17 Dec 2019 5:03 UTC
22 points
7 comments6 min readLW link

Coun­ter­fac­tual Re­pro­gram­ming De­ci­sion Theory

lukeprog10 Sep 2012 1:35 UTC
18 points
8 comments1 min readLW link

Coun­ter­fac­tual Calcu­la­tion and Ob­ser­va­tional Knowledge

Vladimir_Nesov31 Jan 2011 16:28 UTC
20 points
188 comments1 min readLW link

Coun­ter­fac­tu­als as a mat­ter of So­cial Convention

Chris_Leong30 Nov 2019 10:35 UTC
10 points
4 comments2 min readLW link

Coun­ter­fac­tual trade

owencb9 Mar 2015 13:23 UTC
22 points
19 comments3 min readLW link

Coun­ter­fac­tual Mug­ging and Log­i­cal Uncertainty

Vladimir_Nesov5 Sep 2009 22:31 UTC
11 points
21 comments3 min readLW link

Coun­ter­fac­tu­als: Smok­ing Le­sion vs. New­comb’s

Chris_Leong8 Dec 2019 21:02 UTC
9 points
24 comments3 min readLW link

Coun­ter­fac­tu­ally un­in­fluence­able agents

Stuart_Armstrong2 Jun 2017 16:17 UTC
11 points
0 comments2 min readLW link

Coun­ter­fac­tu­als and re­flec­tive oracles

Nisan5 Sep 2018 8:54 UTC
9 points
0 comments6 min readLW link

Coun­ter­fac­tual In­duc­tion (Al­gorithm Sketch, Fix­point proof)

Diffractor17 Dec 2019 5:04 UTC
5 points
2 comments7 min readLW link

[Question] Coun­ter­fac­tual Mug­ging: Why should you pay?

Chris_Leong17 Dec 2019 22:16 UTC
6 points
59 comments3 min readLW link

Coun­ter­fac­tual mug­ging: alien ab­duc­tion edition

Emile28 Sep 2010 21:25 UTC
4 points
18 comments1 min readLW link

Coun­ter­fac­tual In­duc­tion (Lemma 4)

Diffractor17 Dec 2019 5:05 UTC
4 points
0 comments7 min readLW link

Coun­ter­fac­tual do-what-I-mean

Stuart_Armstrong27 Oct 2016 13:54 UTC
5 points
3 comments1 min readLW link

Coun­ter­fac­tual Mug­ging v. Sub­jec­tive Probability

MBlume20 Jul 2009 16:31 UTC
4 points
32 comments1 min readLW link

Coun­ter­fac­tu­als on POMDP

Stuart_Armstrong2 Jun 2017 16:30 UTC
2 points
0 comments2 min readLW link

Coun­ter­fac­tual self-defense

MrMind23 Nov 2012 10:15 UTC
2 points
9 comments1 min readLW link

Coun­ter­fac­tual do-what-I-mean

Stuart_Armstrong27 Oct 2016 13:53 UTC
0 points
3 comments1 min readLW link

Log­i­cal Coun­ter­fac­tu­als and Propo­si­tion graphs, Part 1

Donald Hobson22 Aug 2019 22:06 UTC
20 points
0 comments2 min readLW link

Log­i­cal Coun­ter­fac­tu­als are low-res

shminux15 Oct 2018 3:36 UTC
23 points
14 comments1 min readLW link

The Coun­ter­fac­tual Pri­soner’s Dilemma

Chris_Leong21 Dec 2019 1:44 UTC
21 points
17 comments3 min readLW link

Log­i­cal Coun­ter­fac­tu­als & the Co­op­er­a­tion Game

Chris_Leong14 Aug 2018 14:00 UTC
16 points
26 comments2 min readLW link

Log­i­cal Coun­ter­fac­tu­als and Propo­si­tion graphs, Part 2

Donald Hobson31 Aug 2019 20:58 UTC
13 points
0 comments3 min readLW link

Can Coun­ter­fac­tu­als Be True?

Eliezer Yudkowsky24 Jul 2008 4:40 UTC
32 points
47 comments4 min readLW link

A coun­ter­fac­tual and hy­po­thet­i­cal note on AI safety design

Stuart_Armstrong11 Mar 2015 16:20 UTC
13 points
1 comment1 min readLW link

Log­i­cal Coun­ter­fac­tu­als and Propo­si­tion graphs, Part 3

Donald Hobson5 Sep 2019 15:03 UTC
6 points
0 comments4 min readLW link

Ex­tremely Coun­ter­fac­tual Mug­ging or: the gist of Trans­par­ent Newcomb

Bongo9 Feb 2011 15:20 UTC
10 points
79 comments1 min readLW link

Prov­abil­ity Coun­ter­fac­tu­als vs Three Ax­ioms of Galles and Pearl

IAFF-User-5230 Aug 2015 2:48 UTC
6 points
0 comments1 min readLW link

Log­i­cal coun­ter­fac­tu­als for ran­dom algorithms

Vanessa Kosoy6 Jan 2016 13:29 UTC
5 points
0 comments10 min readLW link

[LINK] Coun­ter­fac­tual Strategies

Strilanc17 Jun 2014 19:29 UTC
5 points
14 comments1 min readLW link

Log­i­cal Coun­ter­fac­tu­als Con­sis­tent Un­der Self-Modification

abramdemski15 Dec 2015 6:38 UTC
3 points
2 comments8 min readLW link

Log­i­cal coun­ter­fac­tu­als and differ­en­tial privacy

Nisan4 Feb 2018 0:17 UTC
1 point
1 comment5 min readLW link

What makes coun­ter­fac­tu­als com­pa­rable?

Chris_Leong24 Apr 2020 22:47 UTC
11 points
6 comments3 min readLW link

The odd coun­ter­fac­tu­als of play­ing chicken

Benya_Fallenstein2 Feb 2015 7:15 UTC
6 points
0 comments8 min readLW link

Haz­ing as Coun­ter­fac­tual Mug­ging?

SilasBarta11 Oct 2010 14:17 UTC
5 points
8 comments1 min readLW link

Third-per­son counterfactuals

Benya_Fallenstein3 Feb 2015 1:13 UTC
4 points
4 comments6 min readLW link

The many coun­ter­fac­tu­als of coun­ter­fac­tual mugging

Scott Garrabrant12 Apr 2016 20:04 UTC
2 points
3 comments2 min readLW link

Sta­bi­liz­ing log­i­cal coun­ter­fac­tu­als by pseudorandomization

Vanessa Kosoy25 May 2016 12:05 UTC
1 point
2 comments8 min readLW link

Un-ma­nipu­la­ble counterfactuals

Stuart_Armstrong12 Feb 2015 19:51 UTC
1 point
5 comments1 min readLW link

Orthog­o­nal­ity: ac­tion counterfactuals

Stuart_Armstrong17 Feb 2015 21:04 UTC
0 points
0 comments1 min readLW link

New­comblike prob­lem: Coun­ter­fac­tual Informant

Clippy12 Apr 2012 20:25 UTC
−3 points
24 comments1 min readLW link

[Question] Would solv­ing log­i­cal coun­ter­fac­tu­als solve an­throp­ics?

Chris_Leong5 Apr 2019 11:08 UTC
20 points
52 comments1 min readLW link

Op­ti­mal and Causal Coun­ter­fac­tual Worlds

Scott Garrabrant12 May 2015 3:16 UTC
14 points
4 comments3 min readLW link

Sleep­ing Beauty gets coun­ter­fac­tu­ally mugged

Stuart_Armstrong26 Mar 2009 11:44 UTC
4 points
34 comments2 min readLW link

Causal graphs and counterfactuals

Stuart_Armstrong30 Aug 2016 16:12 UTC
7 points
2 comments1 min readLW link

Tran­si­tive ne­go­ti­a­tions with coun­ter­fac­tual agents

Scott Garrabrant20 Oct 2016 23:27 UTC
4 points
0 comments1 min readLW link

Agents de­tect­ing agents: coun­ter­fac­tual ver­sus influence

Stuart_Armstrong18 Sep 2015 16:17 UTC
5 points
4 comments7 min readLW link

Hu­mans get differ­ent counterfactuals

Stuart_Armstrong23 Mar 2015 14:54 UTC
4 points
2 comments1 min readLW link

Causal graphs and counterfactuals

Stuart_Armstrong30 Aug 2016 16:06 UTC
0 points
2 comments1 min readLW link

The Curse Of The Counterfactual

pjeby1 Nov 2019 18:34 UTC
125 points
34 comments19 min readLW link1 review

Two Alter­na­tives to Log­i­cal Counterfactuals

jessicata1 Apr 2020 9:48 UTC
38 points
61 comments5 min readLW link

Ad­dress­ing three prob­lems with coun­ter­fac­tual cor­rigi­bil­ity: bad bets, defend­ing against back­stops, and over­con­fi­dence.

RyanCarey21 Oct 2018 12:03 UTC
23 points
1 comment6 min readLW link

Stan­dard ML Or­a­cles vs Coun­ter­fac­tual ones

Stuart_Armstrong10 Oct 2018 20:01 UTC
18 points
5 comments6 min readLW link

An en­vi­ron­ment for study­ing counterfactuals

Nisan11 Jul 2018 0:14 UTC
15 points
6 comments3 min readLW link

On the Role of Coun­ter­fac­tu­als in Learning

Max Kanwal11 Jul 2018 2:45 UTC
11 points
2 comments3 min readLW link

Does TDT pay in Coun­ter­fac­tual Mug­ging?

Bongo29 Nov 2010 21:31 UTC
4 points
5 comments1 min readLW link

You have just been Coun­ter­fac­tu­ally Mugged!

CronoDAS19 Aug 2009 22:24 UTC
7 points
25 comments1 min readLW link

[Question] De­ci­sions with Non-Log­i­cal Coun­ter­fac­tu­als: re­quest for input

reavowed24 Oct 2019 17:23 UTC
3 points
11 comments3 min readLW link

[Question] What are some con­crete prob­lems about log­i­cal coun­ter­fac­tu­als?

Chris_Leong16 Dec 2018 10:20 UTC
25 points
4 comments1 min readLW link

I Was Not Al­most Wrong But I Was Al­most Right: Close-Call Coun­ter­fac­tu­als and Bias

Kaj_Sotala8 Mar 2012 5:39 UTC
86 points
40 comments9 min readLW link

Diver­gence on Ev­i­dence Due to Differ­ing Pri­ors—A Poli­ti­cal Case Study

Davidmanheim16 Sep 2019 11:01 UTC
27 points
3 comments3 min readLW link

A use­ful level distinction

Charlie Steiner24 Feb 2018 6:39 UTC
8 points
4 comments2 min readLW link

JFK was not as­sas­si­nated: prior prob­a­bil­ity zero events

Stuart_Armstrong27 Apr 2016 11:47 UTC
37 points
38 comments3 min readLW link

Mo­ti­vat­ing a Se­man­tics of Log­i­cal Counterfactuals

Sam_A_Barnett22 Sep 2017 1:10 UTC
22 points
3 comments2 min readLW link

Open Prob­lems Re­gard­ing Coun­ter­fac­tu­als: An In­tro­duc­tion For Beginners

Diffractor18 Jul 2017 2:21 UTC
21 points
6 comments1 min readLW link

UDT might not pay a Coun­ter­fac­tual Mugger

winwonce21 Nov 2020 23:27 UTC
5 points
18 comments2 min readLW link

Coun­ter­fac­tual Plan­ning in AGI Systems

Koen.Holtman3 Feb 2021 13:54 UTC
10 points
0 comments5 min readLW link

Graph­i­cal World Models, Coun­ter­fac­tu­als, and Ma­chine Learn­ing Agents

Koen.Holtman17 Feb 2021 11:07 UTC
6 points
2 comments10 min readLW link

Creat­ing AGI Safety Interlocks

Koen.Holtman5 Feb 2021 12:01 UTC
7 points
4 comments8 min readLW link

Safely con­trol­ling the AGI agent re­ward function

Koen.Holtman17 Feb 2021 14:47 UTC
8 points
0 comments5 min readLW link

What is a Coun­ter­fac­tual: An Ele­men­tary In­tro­duc­tion to the Causal Hierarchy

Darmani2 Jan 2022 3:46 UTC
11 points
2 comments5 min readLW link

Ini­tial Thoughts on Dis­solv­ing “Could­ness”

DragonGod22 Sep 2022 21:23 UTC
6 points
1 comment3 min readLW link

[Sketch] Val­idity Cri­te­rion for Log­i­cal Counterfactuals

DragonGod11 Oct 2022 13:31 UTC
6 points
0 comments4 min readLW link

Against the nor­ma­tive re­al­ist’s wager

Joe Carlsmith13 Oct 2022 16:35 UTC
16 points
9 comments23 min readLW link

An On­tol­ogy for Strate­gic Epistemology

StrivingForLegibility28 Dec 2023 22:11 UTC
9 points
0 comments5 min readLW link

Why are coun­ter­fac­tu­als elu­sive?

Martín Soto3 Mar 2023 20:13 UTC
14 points
6 comments2 min readLW link

Distributed Strate­gic Epistemology

StrivingForLegibility28 Dec 2023 22:12 UTC
11 points
0 comments3 min readLW link

Log­i­cal Line-Of-Sight Makes Games Se­quen­tial or Loopy

StrivingForLegibility19 Jan 2024 4:05 UTC
39 points
0 comments7 min readLW link

Coun­ter­fac­tual Mechanism Networks

StrivingForLegibility30 Jan 2024 20:30 UTC
4 points
0 comments5 min readLW link

To Boldly Code

StrivingForLegibility26 Jan 2024 18:25 UTC
25 points
4 comments3 min readLW link

In­cor­po­rat­ing Mechanism De­sign Into De­ci­sion Theory

StrivingForLegibility26 Jan 2024 18:25 UTC
17 points
4 comments4 min readLW link

Time­less Control

Eliezer Yudkowsky7 Jun 2008 5:16 UTC
47 points
69 comments9 min readLW link

Time­less De­ci­sion The­ory and Meta-Cir­cu­lar De­ci­sion Theory

Eliezer Yudkowsky20 Aug 2009 22:07 UTC
42 points
37 comments10 min readLW link

Coun­ter­fac­tu­als, thick and thin

Nisan31 Jul 2018 15:43 UTC
28 points
11 comments2 min readLW link

De­con­fus­ing Log­i­cal Counterfactuals

Chris_Leong30 Jan 2019 15:13 UTC
27 points
16 comments11 min readLW link

Con­di­tion­ing, Coun­ter­fac­tu­als, Ex­plo­ra­tion, and Gears

Diffractor10 Jul 2018 22:11 UTC
28 points
1 comment5 min readLW link

Coun­ter­fac­tual Mug­ging Poker Game

Scott Garrabrant13 Jun 2018 23:34 UTC
111 points
3 comments1 min readLW link

Coun­ter­fac­tual Mugging

Vladimir_Nesov19 Mar 2009 6:08 UTC
80 points
296 comments2 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei Dai10 Sep 2019 8:29 UTC
48 points
26 comments3 min readLW link

Coun­ter­fac­tual out­come state tran­si­tion parameters

Anders_H27 Jul 2018 21:13 UTC
37 points
1 comment6 min readLW link

Coun­ter­fac­tual re­siliency test for non-causal models

Stuart_Armstrong30 Aug 2012 17:30 UTC
34 points
78 comments7 min readLW link

Coun­ter­fac­tu­als ver­sus the laws of physics

Stuart_Armstrong18 Feb 2020 13:21 UTC
16 points
0 comments1 min readLW link

Coun­ter­fac­tu­als are an An­swer, Not a Question

Chris_Leong3 Sep 2019 15:36 UTC
14 points
6 comments4 min readLW link