Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Counterfactuals
Tag
Relevant
New
Old
Auditing LMs with counterfactual search: a tool for control and ELK
Jacob Pfau
20 Feb 2024 0:02 UTC
28
points
6
comments
10
min read
LW
link
Some thoughts on “The Nature of Counterfactuals”
tailcalled
16 Jan 2022 18:12 UTC
20
points
11
comments
11
min read
LW
link
Circular Counterfactuals “Only that which Happens is Possible”
JohnBuridan
23 Mar 2022 14:40 UTC
4
points
15
comments
9
min read
LW
link
Results: Circular Dependency of Counterfactuals Prize
Chris_Leong
5 Apr 2022 6:29 UTC
19
points
0
comments
1
min read
LW
link
Four factors that moderate the intensity of emotions
Ruby
24 Nov 2018 20:40 UTC
58
points
11
comments
8
min read
LW
link
Getting Unstuck on Counterfactuals
Chris_Leong
20 Jul 2022 5:31 UTC
7
points
1
comment
2
min read
LW
link
The Many Faces of Infra-Beliefs
Diffractor
6 Apr 2021 10:43 UTC
30
points
6
comments
63
min read
LW
link
Counterfactuals are Confusing because of an Ontological Shift
Chris_Leong
5 Aug 2022 19:03 UTC
17
points
35
comments
2
min read
LW
link
My Current Take on Counterfactuals
abramdemski
9 Apr 2021 17:51 UTC
53
points
57
comments
25
min read
LW
link
The Nature of Counterfactuals
Chris_Leong
5 Jun 2021 9:18 UTC
15
points
18
comments
4
min read
LW
link
Agency and the unreliable autonomous car
Alex Flint
7 Jul 2021 14:58 UTC
29
points
24
comments
10
min read
LW
link
Counterfactual Contracts
harsimony
16 Sep 2021 15:20 UTC
10
points
4
comments
9
min read
LW
link
(harsimony.wordpress.com)
Applying the Counterfactual Prisoner’s Dilemma to Logical Uncertainty
Chris_Leong
16 Sep 2020 10:34 UTC
9
points
5
comments
2
min read
LW
link
QACI: question-answer counterfactual intervals
Tamsin Leake
24 Oct 2022 13:08 UTC
22
points
0
comments
4
min read
LW
link
(carado.moe)
Decisions: Ontologically Shifting to Determinism
Chris_Leong
21 Dec 2022 12:41 UTC
8
points
11
comments
6
min read
LW
link
Causality and determinism in social science—An investigation using Pearl’s causal ladder
tailcalled
3 Jan 2022 17:51 UTC
12
points
10
comments
9
min read
LW
link
Counterfactuals from ensembles of peers
David Johnston
4 Jan 2022 7:01 UTC
3
points
4
comments
7
min read
LW
link
Counterfactuals for Perfect Predictors
Chris_Leong
6 Aug 2018 12:24 UTC
12
points
17
comments
6
min read
LW
link
Counterfactual Induction
Diffractor
17 Dec 2019 5:03 UTC
22
points
7
comments
6
min read
LW
link
Counterfactual Reprogramming Decision Theory
lukeprog
10 Sep 2012 1:35 UTC
18
points
8
comments
1
min read
LW
link
Counterfactual Calculation and Observational Knowledge
Vladimir_Nesov
31 Jan 2011 16:28 UTC
20
points
188
comments
1
min read
LW
link
Counterfactuals as a matter of Social Convention
Chris_Leong
30 Nov 2019 10:35 UTC
10
points
4
comments
2
min read
LW
link
Counterfactual trade
owencb
9 Mar 2015 13:23 UTC
21
points
19
comments
3
min read
LW
link
Counterfactual Mugging and Logical Uncertainty
Vladimir_Nesov
5 Sep 2009 22:31 UTC
11
points
21
comments
3
min read
LW
link
Counterfactuals: Smoking Lesion vs. Newcomb’s
Chris_Leong
8 Dec 2019 21:02 UTC
9
points
24
comments
3
min read
LW
link
Counterfactually uninfluenceable agents
Stuart_Armstrong
2 Jun 2017 16:17 UTC
11
points
0
comments
2
min read
LW
link
Counterfactuals and reflective oracles
Nisan
5 Sep 2018 8:54 UTC
9
points
0
comments
6
min read
LW
link
Counterfactual Induction (Algorithm Sketch, Fixpoint proof)
Diffractor
17 Dec 2019 5:04 UTC
5
points
2
comments
7
min read
LW
link
[Question]
Counterfactual Mugging: Why should you pay?
Chris_Leong
17 Dec 2019 22:16 UTC
6
points
59
comments
3
min read
LW
link
Counterfactual mugging: alien abduction edition
Emile
28 Sep 2010 21:25 UTC
4
points
18
comments
1
min read
LW
link
Counterfactual Induction (Lemma 4)
Diffractor
17 Dec 2019 5:05 UTC
4
points
0
comments
7
min read
LW
link
Counterfactual do-what-I-mean
Stuart_Armstrong
27 Oct 2016 13:54 UTC
5
points
3
comments
1
min read
LW
link
Counterfactual Mugging v. Subjective Probability
MBlume
20 Jul 2009 16:31 UTC
4
points
32
comments
1
min read
LW
link
Counterfactuals on POMDP
Stuart_Armstrong
2 Jun 2017 16:30 UTC
2
points
0
comments
2
min read
LW
link
Counterfactual self-defense
MrMind
23 Nov 2012 10:15 UTC
2
points
9
comments
1
min read
LW
link
Counterfactual do-what-I-mean
Stuart_Armstrong
27 Oct 2016 13:53 UTC
0
points
3
comments
1
min read
LW
link
Logical Counterfactuals and Proposition graphs, Part 1
Donald Hobson
22 Aug 2019 22:06 UTC
20
points
0
comments
2
min read
LW
link
Logical Counterfactuals are low-res
shminux
15 Oct 2018 3:36 UTC
23
points
14
comments
1
min read
LW
link
(donerkebabphilosophy.wordpress.com)
The Counterfactual Prisoner’s Dilemma
Chris_Leong
21 Dec 2019 1:44 UTC
21
points
17
comments
3
min read
LW
link
Logical Counterfactuals & the Cooperation Game
Chris_Leong
14 Aug 2018 14:00 UTC
16
points
26
comments
2
min read
LW
link
Logical Counterfactuals and Proposition graphs, Part 2
Donald Hobson
31 Aug 2019 20:58 UTC
13
points
0
comments
3
min read
LW
link
Can Counterfactuals Be True?
Eliezer Yudkowsky
24 Jul 2008 4:40 UTC
30
points
47
comments
4
min read
LW
link
A counterfactual and hypothetical note on AI safety design
Stuart_Armstrong
11 Mar 2015 16:20 UTC
13
points
1
comment
1
min read
LW
link
Logical Counterfactuals and Proposition graphs, Part 3
Donald Hobson
5 Sep 2019 15:03 UTC
6
points
0
comments
4
min read
LW
link
Extremely Counterfactual Mugging or: the gist of Transparent Newcomb
Bongo
9 Feb 2011 15:20 UTC
10
points
79
comments
1
min read
LW
link
Provability Counterfactuals vs Three Axioms of Galles and Pearl
IAFF-User-52
30 Aug 2015 2:48 UTC
6
points
0
comments
1
min read
LW
link
(epsilonofdoom.blogspot.com)
Logical counterfactuals for random algorithms
Vanessa Kosoy
6 Jan 2016 13:29 UTC
5
points
0
comments
10
min read
LW
link
[LINK] Counterfactual Strategies
Strilanc
17 Jun 2014 19:29 UTC
5
points
14
comments
1
min read
LW
link
Logical Counterfactuals Consistent Under Self-Modification
abramdemski
15 Dec 2015 6:38 UTC
3
points
2
comments
8
min read
LW
link
Logical counterfactuals and differential privacy
Nisan
4 Feb 2018 0:17 UTC
1
point
1
comment
5
min read
LW
link
What makes counterfactuals comparable?
Chris_Leong
24 Apr 2020 22:47 UTC
11
points
6
comments
3
min read
LW
link
The odd counterfactuals of playing chicken
Benya_Fallenstein
2 Feb 2015 7:15 UTC
6
points
0
comments
8
min read
LW
link
Hazing as Counterfactual Mugging?
SilasBarta
11 Oct 2010 14:17 UTC
5
points
8
comments
1
min read
LW
link
Third-person counterfactuals
Benya_Fallenstein
3 Feb 2015 1:13 UTC
4
points
4
comments
6
min read
LW
link
The many counterfactuals of counterfactual mugging
Scott Garrabrant
12 Apr 2016 20:04 UTC
2
points
3
comments
2
min read
LW
link
Stabilizing logical counterfactuals by pseudorandomization
Vanessa Kosoy
25 May 2016 12:05 UTC
1
point
2
comments
8
min read
LW
link
Un-manipulable counterfactuals
Stuart_Armstrong
12 Feb 2015 19:51 UTC
1
point
5
comments
1
min read
LW
link
Orthogonality: action counterfactuals
Stuart_Armstrong
17 Feb 2015 21:04 UTC
0
points
0
comments
1
min read
LW
link
Newcomblike problem: Counterfactual Informant
Clippy
12 Apr 2012 20:25 UTC
−3
points
24
comments
1
min read
LW
link
[Question]
Would solving logical counterfactuals solve anthropics?
Chris_Leong
5 Apr 2019 11:08 UTC
20
points
52
comments
1
min read
LW
link
Optimal and Causal Counterfactual Worlds
Scott Garrabrant
12 May 2015 3:16 UTC
14
points
4
comments
3
min read
LW
link
Sleeping Beauty gets counterfactually mugged
Stuart_Armstrong
26 Mar 2009 11:44 UTC
4
points
34
comments
2
min read
LW
link
Causal graphs and counterfactuals
Stuart_Armstrong
30 Aug 2016 16:12 UTC
7
points
2
comments
1
min read
LW
link
Transitive negotiations with counterfactual agents
Scott Garrabrant
20 Oct 2016 23:27 UTC
4
points
0
comments
1
min read
LW
link
Agents detecting agents: counterfactual versus influence
Stuart_Armstrong
18 Sep 2015 16:17 UTC
5
points
4
comments
7
min read
LW
link
Humans get different counterfactuals
Stuart_Armstrong
23 Mar 2015 14:54 UTC
4
points
2
comments
1
min read
LW
link
Causal graphs and counterfactuals
Stuart_Armstrong
30 Aug 2016 16:06 UTC
0
points
2
comments
1
min read
LW
link
The Curse Of The Counterfactual
pjeby
1 Nov 2019 18:34 UTC
124
points
34
comments
19
min read
LW
link
1
review
Two Alternatives to Logical Counterfactuals
jessicata
1 Apr 2020 9:48 UTC
38
points
61
comments
5
min read
LW
link
(unstableontology.com)
Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.
RyanCarey
21 Oct 2018 12:03 UTC
23
points
1
comment
6
min read
LW
link
Standard ML Oracles vs Counterfactual ones
Stuart_Armstrong
10 Oct 2018 20:01 UTC
18
points
5
comments
6
min read
LW
link
An environment for studying counterfactuals
Nisan
11 Jul 2018 0:14 UTC
15
points
6
comments
3
min read
LW
link
On the Role of Counterfactuals in Learning
Max Kanwal
11 Jul 2018 2:45 UTC
11
points
2
comments
3
min read
LW
link
Does TDT pay in Counterfactual Mugging?
Bongo
29 Nov 2010 21:31 UTC
4
points
5
comments
1
min read
LW
link
You have just been Counterfactually Mugged!
CronoDAS
19 Aug 2009 22:24 UTC
7
points
25
comments
1
min read
LW
link
[Question]
Decisions with Non-Logical Counterfactuals: request for input
reavowed
24 Oct 2019 17:23 UTC
3
points
11
comments
3
min read
LW
link
[Question]
What are some concrete problems about logical counterfactuals?
Chris_Leong
16 Dec 2018 10:20 UTC
25
points
4
comments
1
min read
LW
link
I Was Not Almost Wrong But I Was Almost Right: Close-Call Counterfactuals and Bias
Kaj_Sotala
8 Mar 2012 5:39 UTC
86
points
40
comments
9
min read
LW
link
Divergence on Evidence Due to Differing Priors—A Political Case Study
Davidmanheim
16 Sep 2019 11:01 UTC
27
points
3
comments
3
min read
LW
link
A useful level distinction
Charlie Steiner
24 Feb 2018 6:39 UTC
8
points
4
comments
2
min read
LW
link
JFK was not assassinated: prior probability zero events
Stuart_Armstrong
27 Apr 2016 11:47 UTC
37
points
38
comments
3
min read
LW
link
Motivating a Semantics of Logical Counterfactuals
Sam_A_Barnett
22 Sep 2017 1:10 UTC
22
points
3
comments
2
min read
LW
link
Open Problems Regarding Counterfactuals: An Introduction For Beginners
Diffractor
18 Jul 2017 2:21 UTC
21
points
6
comments
1
min read
LW
link
(www.overleaf.com)
UDT might not pay a Counterfactual Mugger
winwonce
21 Nov 2020 23:27 UTC
5
points
18
comments
2
min read
LW
link
Counterfactual Planning in AGI Systems
Koen.Holtman
3 Feb 2021 13:54 UTC
10
points
0
comments
5
min read
LW
link
Graphical World Models, Counterfactuals, and Machine Learning Agents
Koen.Holtman
17 Feb 2021 11:07 UTC
6
points
2
comments
10
min read
LW
link
Creating AGI Safety Interlocks
Koen.Holtman
5 Feb 2021 12:01 UTC
7
points
4
comments
8
min read
LW
link
Safely controlling the AGI agent reward function
Koen.Holtman
17 Feb 2021 14:47 UTC
8
points
0
comments
5
min read
LW
link
What is a Counterfactual: An Elementary Introduction to the Causal Hierarchy
Darmani
2 Jan 2022 3:46 UTC
11
points
2
comments
5
min read
LW
link
Initial Thoughts on Dissolving “Couldness”
DragonGod
22 Sep 2022 21:23 UTC
6
points
1
comment
3
min read
LW
link
[Sketch] Validity Criterion for Logical Counterfactuals
DragonGod
11 Oct 2022 13:31 UTC
6
points
0
comments
4
min read
LW
link
Against the normative realist’s wager
Joe Carlsmith
13 Oct 2022 16:35 UTC
16
points
9
comments
23
min read
LW
link
An Ontology for Strategic Epistemology
StrivingForLegibility
28 Dec 2023 22:11 UTC
9
points
0
comments
5
min read
LW
link
Why are counterfactuals elusive?
Martín Soto
3 Mar 2023 20:13 UTC
14
points
6
comments
2
min read
LW
link
Distributed Strategic Epistemology
StrivingForLegibility
28 Dec 2023 22:12 UTC
11
points
0
comments
3
min read
LW
link
Logical Line-Of-Sight Makes Games Sequential or Loopy
StrivingForLegibility
19 Jan 2024 4:05 UTC
38
points
0
comments
7
min read
LW
link
Counterfactual Mechanism Networks
StrivingForLegibility
30 Jan 2024 20:30 UTC
4
points
0
comments
5
min read
LW
link
To Boldly Code
StrivingForLegibility
26 Jan 2024 18:25 UTC
25
points
4
comments
3
min read
LW
link
Incorporating Mechanism Design Into Decision Theory
StrivingForLegibility
26 Jan 2024 18:25 UTC
17
points
4
comments
4
min read
LW
link
Timeless Control
Eliezer Yudkowsky
7 Jun 2008 5:16 UTC
41
points
69
comments
9
min read
LW
link
Timeless Decision Theory and Meta-Circular Decision Theory
Eliezer Yudkowsky
20 Aug 2009 22:07 UTC
41
points
37
comments
10
min read
LW
link
Counterfactuals, thick and thin
Nisan
31 Jul 2018 15:43 UTC
28
points
11
comments
2
min read
LW
link
Deconfusing Logical Counterfactuals
Chris_Leong
30 Jan 2019 15:13 UTC
27
points
16
comments
11
min read
LW
link
Conditioning, Counterfactuals, Exploration, and Gears
Diffractor
10 Jul 2018 22:11 UTC
28
points
1
comment
5
min read
LW
link
Counterfactual Mugging Poker Game
Scott Garrabrant
13 Jun 2018 23:34 UTC
111
points
3
comments
1
min read
LW
link
Counterfactual Mugging
Vladimir_Nesov
19 Mar 2009 6:08 UTC
80
points
296
comments
2
min read
LW
link
Counterfactual Oracles = online supervised learning with random selection of training episodes
Wei Dai
10 Sep 2019 8:29 UTC
48
points
26
comments
3
min read
LW
link
Counterfactual outcome state transition parameters
Anders_H
27 Jul 2018 21:13 UTC
37
points
1
comment
6
min read
LW
link
Counterfactual resiliency test for non-causal models
Stuart_Armstrong
30 Aug 2012 17:30 UTC
34
points
78
comments
7
min read
LW
link
Counterfactuals versus the laws of physics
Stuart_Armstrong
18 Feb 2020 13:21 UTC
16
points
0
comments
1
min read
LW
link
Counterfactuals are an Answer, Not a Question
Chris_Leong
3 Sep 2019 15:36 UTC
14
points
6
comments
4
min read
LW
link