Counterfactuals

Tag

The Counterfactual Quiet AGI Timeline

Davidmanheim5 Oct 2025 9:09 UTC

73 points

5 comments9 min readLW link

Auditing LMs with counterfactual search: a tool for control and ELK

Jacob Pfau20 Feb 2024 0:02 UTC

28 points

6 comments10 min readLW link

My Current Take on Counterfactuals

abramdemski9 Apr 2021 17:51 UTC

56 points

57 comments25 min readLW link

Causality and determinism in social science—An investigation using Pearl’s causal ladder

tailcalled3 Jan 2022 17:51 UTC

13 points

10 comments9 min readLW link

The Nature of Counterfactuals

Chris_Leong5 Jun 2021 9:18 UTC

16 points

18 comments4 min readLW link

Probability Theory Fundamentals 102: Territory that Probability is in the Map of

Ape in the coat26 Mar 2025 6:40 UTC

10 points

17 comments9 min readLW link

Getting Unstuck on Counterfactuals

Chris_Leong20 Jul 2022 5:31 UTC

7 points

1 comment2 min readLW link

Decisions: Ontologically Shifting to Determinism

Chris_Leong21 Dec 2022 12:41 UTC

8 points

11 comments6 min readLW link

[Question] What are the two contradictory theories of how to evaluate counterfactuals?

Said Achmiz25 Jul 2025 18:43 UTC

29 points

16 comments1 min readLW link

Agency and the unreliable autonomous car

Alex Flint7 Jul 2021 14:58 UTC

29 points

24 comments10 min readLW link

Four factors that moderate the intensity of emotions

Ruby24 Nov 2018 20:40 UTC

65 points

11 comments8 min readLW link

Circular Counterfactuals “Only that which Happens is Possible”

SebastianG 23 Mar 2022 14:40 UTC

4 points

15 comments9 min readLW link

Some thoughts on “The Nature of Counterfactuals”

tailcalled16 Jan 2022 18:12 UTC

20 points

11 comments11 min readLW link

Counterfactuals are Confusing because of an Ontological Shift

Chris_Leong5 Aug 2022 19:03 UTC

17 points

35 comments2 min readLW link

The Many Faces of Infra-Beliefs

Diffractor6 Apr 2021 10:43 UTC

30 points

6 comments63 min readLW link

Results: Circular Dependency of Counterfactuals Prize

Chris_Leong5 Apr 2022 6:29 UTC

19 points

0 comments1 min readLW link

Counterfactuals from ensembles of peers

David Johnston4 Jan 2022 7:01 UTC

3 points

4 comments7 min readLW link

Counterfactual Contracts

harsimony16 Sep 2021 15:20 UTC

12 points

4 comments9 min readLW link

(harsimony.wordpress.com)

Applying the Counterfactual Prisoner’s Dilemma to Logical Uncertainty

Chris_Leong16 Sep 2020 10:34 UTC

13 points

5 comments2 min readLW link

Counterfactually uninfluenceable agents

Stuart_Armstrong2 Jun 2017 16:17 UTC

11 points

0 comments2 min readLW link

The odd counterfactuals of playing chicken

Benya_Fallenstein2 Feb 2015 7:15 UTC

6 points

0 comments8 min readLW link

[Question] Decisions with Non-Logical Counterfactuals: request for input

reavowed24 Oct 2019 17:23 UTC

3 points

11 comments3 min readLW link

Counterfactuals are an Answer, Not a Question

Chris_Leong3 Sep 2019 15:36 UTC

14 points

6 comments4 min readLW link

Standard ML Oracles vs Counterfactual ones

Stuart_Armstrong10 Oct 2018 20:01 UTC

18 points

5 comments6 min readLW link

[Sketch] Validity Criterion for Logical Counterfactuals

DragonGod11 Oct 2022 13:31 UTC

6 points

0 comments6 min readLW link

Logical Counterfactuals and Proposition graphs, Part 2

Donald Hobson31 Aug 2019 20:58 UTC

13 points

0 comments3 min readLW link

Counterfactual Planning in AGI Systems

Koen.Holtman3 Feb 2021 13:54 UTC

10 points

0 comments5 min readLW link

Counterfactual self-defense

MrMind23 Nov 2012 10:15 UTC

2 points

9 comments1 min readLW link

Counterfactual Induction (Algorithm Sketch, Fixpoint proof)

Diffractor17 Dec 2019 5:04 UTC

5 points

2 comments7 min readLW link

Counterfactual Mugging Poker Game

Scott Garrabrant13 Jun 2018 23:34 UTC

134 points

4 comments1 min readLW link

Causal graphs and counterfactuals

Stuart_Armstrong30 Aug 2016 16:06 UTC

0 points

2 comments1 min readLW link

Initial Thoughts on Dissolving “Couldness”

DragonGod22 Sep 2022 21:23 UTC

6 points

1 comment3 min readLW link

Motivating a Semantics of Logical Counterfactuals

Sam_A_Barnett22 Sep 2017 1:10 UTC

23 points

3 comments2 min readLW link

Stabilizing logical counterfactuals by pseudorandomization

Vanessa Kosoy25 May 2016 12:05 UTC

1 point

2 comments8 min readLW link

Can Counterfactuals Be True?

Eliezer Yudkowsky24 Jul 2008 4:40 UTC

33 points

47 comments4 min readLW link

[Question] What are some concrete problems about logical counterfactuals?

Chris_Leong16 Dec 2018 10:20 UTC

25 points

4 comments1 min readLW link

On the Role of Counterfactuals in Learning

Max Kanwal11 Jul 2018 2:45 UTC

13 points

2 comments3 min readLW link

Counterfactual resiliency test for non-causal models

Stuart_Armstrong30 Aug 2012 17:30 UTC

34 points

78 comments7 min readLW link

To Boldly Code

StrivingForLegibility26 Jan 2024 18:25 UTC

26 points

4 comments3 min readLW link

Counterfactuals: Smoking Lesion vs. Newcomb’s

Chris_Leong8 Dec 2019 21:02 UTC

9 points

24 comments3 min readLW link

Counterfactual do-what-I-mean

Stuart_Armstrong27 Oct 2016 13:53 UTC

0 points

3 comments1 min readLW link

The Curse Of The Counterfactual

pjeby1 Nov 2019 18:34 UTC

143 points

35 comments19 min readLW link 1 review

Creating AGI Safety Interlocks

Koen.Holtman5 Feb 2021 12:01 UTC

7 points

4 comments8 min readLW link

Counterfactuals, thick and thin

Nisan31 Jul 2018 15:43 UTC

29 points

11 comments2 min readLW link

Distributed Strategic Epistemology

StrivingForLegibility28 Dec 2023 22:12 UTC

11 points

0 comments3 min readLW link

Incorporating Mechanism Design Into Decision Theory

StrivingForLegibility26 Jan 2024 18:25 UTC

17 points

4 comments4 min readLW link

Against the normative realist’s wager

Joe Carlsmith13 Oct 2022 16:35 UTC

16 points

9 comments23 min readLW link

Counterfactual Oracles = online supervised learning with random selection of training episodes

Wei Dai10 Sep 2019 8:29 UTC

52 points

26 comments3 min readLW link

An Ontology for Strategic Epistemology

StrivingForLegibility28 Dec 2023 22:11 UTC

9 points

0 comments5 min readLW link

Causal graphs and counterfactuals

Stuart_Armstrong30 Aug 2016 16:12 UTC

7 points

2 comments1 min readLW link

Counterfactual outcome state transition parameters

Anders_H27 Jul 2018 21:13 UTC

37 points

1 comment6 min readLW link

The Counterfactual Prisoner’s Dilemma

Chris_Leong21 Dec 2019 1:44 UTC

21 points

17 comments3 min readLW link

Transitive negotiations with counterfactual agents

Scott Garrabrant20 Oct 2016 23:27 UTC

4 points

0 comments1 min readLW link

Sleeping Beauty gets counterfactually mugged

Stuart_Armstrong26 Mar 2009 11:44 UTC

6 points

34 comments2 min readLW link

Counterfactual Mugging

Vladimir_Nesov19 Mar 2009 6:08 UTC

92 points

299 comments2 min readLW link

Un-manipulable counterfactuals

Stuart_Armstrong12 Feb 2015 19:51 UTC

1 point

5 comments1 min readLW link

Counterfactual Mugging v. Subjective Probability

MBlume20 Jul 2009 16:31 UTC

4 points

32 comments1 min readLW link

Timeless Decision Theory and Meta-Circular Decision Theory

Eliezer Yudkowsky20 Aug 2009 22:07 UTC

42 points

37 comments10 min readLW link

Hazing as Counterfactual Mugging?

SilasBarta11 Oct 2010 14:17 UTC

5 points

8 comments1 min readLW link

A useful level distinction

Charlie Steiner24 Feb 2018 6:39 UTC

8 points

4 comments2 min readLW link

Logical Counterfactuals and Proposition graphs, Part 3

Donald Hobson5 Sep 2019 15:03 UTC

6 points

0 comments4 min readLW link

JFK was not assassinated: prior probability zero events

Stuart_Armstrong27 Apr 2016 11:47 UTC

38 points

38 comments3 min readLW link

Humans get different counterfactuals

Stuart_Armstrong23 Mar 2015 14:54 UTC

4 points

2 comments1 min readLW link

Optimal and Causal Counterfactual Worlds

Scott Garrabrant12 May 2015 3:16 UTC

14 points

4 comments3 min readLW link

Agents detecting agents: counterfactual versus influence

Stuart_Armstrong18 Sep 2015 16:17 UTC

5 points

4 comments7 min readLW link

You have just been Counterfactually Mugged!

CronoDAS19 Aug 2009 22:24 UTC

7 points

25 comments1 min readLW link

Open Problems Regarding Counterfactuals: An Introduction For Beginners

Diffractor18 Jul 2017 2:21 UTC

21 points

6 comments1 min readLW link

(www.overleaf.com)

An environment for studying counterfactuals

Nisan11 Jul 2018 0:14 UTC

15 points

6 comments3 min readLW link

[LINK] Counterfactual Strategies

Strilanc17 Jun 2014 19:29 UTC

5 points

14 comments1 min readLW link

Divergence on Evidence Due to Differing Priors—A Political Case Study

Davidmanheim16 Sep 2019 11:01 UTC

27 points

3 comments3 min readLW link

Logical Counterfactuals and Proposition graphs, Part 1

Donald Hobson22 Aug 2019 22:06 UTC

20 points

0 comments3 min readLW link

Conditioning, Counterfactuals, Exploration, and Gears

Diffractor10 Jul 2018 22:11 UTC

28 points

1 comment5 min readLW link

Third-person counterfactuals

Benya_Fallenstein3 Feb 2015 1:13 UTC

4 points

4 comments6 min readLW link

Counterfactual Mechanism Networks

StrivingForLegibility30 Jan 2024 20:30 UTC

5 points

0 comments5 min readLW link

Logical Counterfactuals are low-res

Shmi15 Oct 2018 3:36 UTC

24 points

14 comments1 min readLW link

(donerkebabphilosophy.wordpress.com)

The many counterfactuals of counterfactual mugging

Scott Garrabrant12 Apr 2016 20:04 UTC

2 points

3 comments2 min readLW link

Counterfactual Mugging and Logical Uncertainty

Vladimir_Nesov5 Sep 2009 22:31 UTC

16 points

21 comments3 min readLW link

Deconfusing Logical Counterfactuals

Chris_Leong30 Jan 2019 15:13 UTC

29 points

16 comments11 min readLW link

Counterfactual Calculation and Observational Knowledge

Vladimir_Nesov31 Jan 2011 16:28 UTC

20 points

188 comments1 min readLW link

Counterfactuals on POMDP

Stuart_Armstrong2 Jun 2017 16:30 UTC

2 points

0 comments2 min readLW link

Extremely Counterfactual Mugging or: the gist of Transparent Newcomb

Bongo9 Feb 2011 15:20 UTC

10 points

79 comments1 min readLW link

Counterfactual Reprogramming Decision Theory

lukeprog10 Sep 2012 1:35 UTC

18 points

8 comments1 min readLW link

Newcomblike problem: Counterfactual Informant

Clippy12 Apr 2012 20:25 UTC

−3 points

24 comments1 min readLW link

Why are counterfactuals elusive?

Martín Soto3 Mar 2023 20:13 UTC

14 points

6 comments2 min readLW link

Logical Counterfactuals & the Cooperation Game

Chris_Leong14 Aug 2018 14:00 UTC

16 points

26 comments2 min readLW link

Logical Line-Of-Sight Makes Games Sequential or Loopy

StrivingForLegibility19 Jan 2024 4:05 UTC

40 points

0 comments7 min readLW link

Provability Counterfactuals vs Three Axioms of Galles and Pearl

IAFF-User-5230 Aug 2015 2:48 UTC

6 points

0 comments1 min readLW link

(epsilonofdoom.blogspot.com)

What makes counterfactuals comparable?

Chris_Leong24 Apr 2020 22:47 UTC

11 points

6 comments3 min readLW link

Counterfactuals for Perfect Predictors

Chris_Leong6 Aug 2018 12:24 UTC

12 points

17 comments6 min readLW link

[Question] Counterfactual Mugging: Why should you pay?

Chris_Leong17 Dec 2019 22:16 UTC

12 points

59 comments3 min readLW link

Logical Counterfactuals Consistent Under Self-Modification

abramdemski15 Dec 2015 6:38 UTC

3 points

2 comments8 min readLW link

Counterfactuals as a matter of Social Convention

Chris_Leong30 Nov 2019 10:35 UTC

10 points

4 comments2 min readLW link

UDT might not pay a Counterfactual Mugger

winwonce21 Nov 2020 23:27 UTC

5 points

18 comments2 min readLW link

Counterfactuals and reflective oracles

Nisan5 Sep 2018 8:54 UTC

9 points

0 comments6 min readLW link

Logical counterfactuals for random algorithms

Vanessa Kosoy6 Jan 2016 13:29 UTC

5 points

0 comments10 min readLW link

Counterfactuals versus the laws of physics

Stuart_Armstrong18 Feb 2020 13:21 UTC

16 points

0 comments1 min readLW link

Orthogonality: action counterfactuals

Stuart_Armstrong17 Feb 2015 21:04 UTC

0 points

0 comments1 min readLW link

Counterfactual do-what-I-mean

Stuart_Armstrong27 Oct 2016 13:54 UTC

5 points

3 comments1 min readLW link

Graphical World Models, Counterfactuals, and Machine Learning Agents

Koen.Holtman17 Feb 2021 11:07 UTC

6 points

2 comments10 min readLW link

Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.

RyanCarey21 Oct 2018 12:03 UTC

23 points

1 comment6 min readLW link

Counterfactual trade

owencb9 Mar 2015 13:23 UTC

22 points

19 comments3 min readLW link

A counterfactual and hypothetical note on AI safety design

Stuart_Armstrong11 Mar 2015 16:20 UTC

13 points

1 comment1 min readLW link

Counterfactual Induction (Lemma 4)

Diffractor17 Dec 2019 5:05 UTC

4 points

0 comments7 min readLW link

[Question] Would solving logical counterfactuals solve anthropics?

Chris_Leong5 Apr 2019 11:08 UTC

20 points

52 comments1 min readLW link

What is a Counterfactual: An Elementary Introduction to the Causal Hierarchy

Darmani2 Jan 2022 3:46 UTC

11 points

2 comments5 min readLW link

Safely controlling the AGI agent reward function

Koen.Holtman17 Feb 2021 14:47 UTC

8 points

0 comments5 min readLW link

Does TDT pay in Counterfactual Mugging?

Bongo29 Nov 2010 21:31 UTC

4 points

5 comments1 min readLW link

Counterfactual Induction

Diffractor17 Dec 2019 5:03 UTC

23 points

7 comments6 min readLW link

I Was Not Almost Wrong But I Was Almost Right: Close-Call Counterfactuals and Bias

Kaj_Sotala8 Mar 2012 5:39 UTC

86 points

40 comments9 min readLW link

Two Alternatives to Logical Counterfactuals

jessicata1 Apr 2020 9:48 UTC

39 points

61 comments5 min readLW link

(unstableontology.com)

Counterfactual mugging: alien abduction edition

Emile28 Sep 2010 21:25 UTC

4 points

18 comments1 min readLW link

Logical counterfactuals and differential privacy

Nisan4 Feb 2018 0:17 UTC

1 point

1 comment5 min readLW link

Timeless Control

Eliezer Yudkowsky7 Jun 2008 5:16 UTC

47 points

69 comments9 min readLW link

Victor Porton 1 Sep 2023 8:21 UTC
1 point
0
I always thought that counter-factual means some message that is not conforming to reality. Was my personal understanding of semantics of this word wrong? Or maybe, your definition and my intuitive understanding can be reconciled? Isn’t counter-factual as contrary to past decisions a special case of counter-factual as not conforming to reality? If yes, can the word be used in both senses, dependently on a context?