$1000 USD prize—Circular Dependency of Counterfactuals

Congrats to the winner TailCalled with their post Some thoughts on “The Nature of Counterfactuals”. See the winner announcement post.

I’ve previously argued that the concept of counterfactuals can only be understood from within the counterfactual perspective.

I will be awarding a $1000 prize for the best post that engages with the idea that counterfactuals may be circular in this sense. The winning entry may be one of the following (these categories aren’t intended to be exclusive):

a) A post that attempts to draw out the consequences of this principle for decision theory

b) A post that attempts to evaluate the arguments for and against adopting the principle that counterfactuals only make sense from within the counterfactual perspective

c) A review of relevant literature in philosophy or decision theory

d) A post that restates already existing ideas in a clearer or more accessible manner (I don’t think this topic has been explored much on LW, but it may have in explored in the literature on decision theory or philosophy)

Feel free to ask me for clarification about what would be on or off-topic. Probably the main thing I’d like to see is substantial engagement with this principle. The bounty is for posts that engage with the notion that counterfactuals might only make sense from within a counterfactual perspective. I have written on this topic, but the competition isn’t limited to posts that engage with my views on this topic. It’s perfectly fine to engage with other arguments for this proposition if, for example, you find someone arguing in favour of this in the philosophical/​mathematical literature or Less Wrong.

If someone submits a high-quality post that only touches on this issue tangentially, but someone else submits an only okayish post that tries to deeply engage with this issue, then I would likely award it to the latter as I’m trying to incentivise more engagement with this issue rather than just high-quality posts generally. If the bounty is awarded to an unexpected submission, I expect this to be the main cause.

I will be awarding an additional $100 for the best short-form post on this topic. This may be a LW Shortform post, a public Facebook post, a Twitter thread, ect (I’m not going to include Discord/​Slack messages as they aren’t accessible).

Why do I believe in this principle?

Roughly, my reasons are as follows:

  1. Rejecting David Lewis’ Counterfactual Realism as absurd and therefore concluding that counterfactuals must be at least partially a human construction: either a) in the sense of them being an inevitable and essential part of how we make sense of the world by our very nature or b) in the sense of being a semi-arbitrary and contingent system that we’ve adopted in order to navigate the world

  2. Insofar as counterfactuals are inherently a part of how we interpret the world, the only way that we can understand them is to “look out through them”, notice what we see, and attempt to characterise this as precisely as possible

  3. Insofar as counterfactuals are a somewhat arbitrary and contingent system constructed in order to navigate the world, the way that the system is justified is by imagining adopting various mental frameworks and noticing that a particular framework seems like it would be useful over a wide variety of circumstances. However, we’ve just invoked counterfactuals twice: a) by imagining adopting different mental frameworks b) by imagining different circumstances over which to evaluate these frameworks[1].

  4. In either case, we seem to be unable to characterise counterfactuals without depending on already having the concept of counterfactuals. Or at least, I find this argument persuasive.

Why do I believe this is important?

I’ve argued for the importance of agent meta-foundations before. Roughly, there seems to be a lot of confusion about what counterfactuals are and how to construct them. I believe that much of this confusion would be cleared up if we can sort out some of these foundational issues. And the claim that counterfactuals can only be understood from an interior perspective is one such issue.

Why am I posting this bounty?

I believe in this idea, but:

  1. I haven’t been able to dedicate nearly as much to time exploring this as I would like in between all of my other commitments

  2. Working on this approach just by myself is kind of lonely and extremely challenging (for example, it’s hard to get good quality feedback)

  3. I suspect that more people would be persuaded that this was a fruitful approach if this principle was presented to them in a different light.

How do I submit my entry?

Make a post on LW or the Alignment forum, then add a link in the comments below. I guess I’m also open to private submissions. Ideally, you should mention that you’re submitting your post for the bounty just to make sure that I’m aware of it.

When do I need to submit by?

I’m currently planning to set the submission window to 3 months from the date of this post (that would be the 1st of April, but let’s make it April 2nd so people don’t think this competition is some kind of prank). Submissions after this date may be refused.

How will this be judged?

I’ve written on this topic myself, so this probably biases me in some ways, but $1000 is a small enough amount of money that it’s probably not worthwhile looking for external judges.

Some Background Info

I guess I started to believe that counterfactuals were circular when I started to ask questions like, “What actually are these things we call counterfactuals?”. I noticed that they didn’t seem to exist in a literal sense, but that we also seem to be unable to do without them.

Some people have asked why the Bayesian Network approach suggested by Judea Pearl is insufficient (including in the comments below). This approach is firmly rooted in Causal Decision Theory (CDT). Most people on LW have rejected CDT because of its failure to handle Newcomb’s Problem.

MIRI has proposed Functional Decision Theory (FDT) as an alternative, but this theory is dependent on logical counterfactuals and they haven’t figured out exactly how to construct these. While I don’t exactly agree with the logical counterfactual framing, I agree that these kinds of exotic decision theory problems require us to create a new notion of counterfactuals. And this naturally leads to questions about what counterfactuals really are which I see as further leading to the conclusion that they are circular.

I can see why many people are sufficiently skeptical of the notion of counterfactuals being circular that they dismiss it out of hand. It’s entirely possible that I could be mistaken about this thesis, but for these people, I’d suggest reading Eliezer’s post Where Recursive Justification Hits Bottom which argues for a circular epistemology since if you are persuaded by this post, counterfactuals being circular may then be less of a jump.

Fine Print

I’ll award the prize assuming that there’s at least one semi-decent submission (according to the standards of posts on Less Wrong). If this isn’t the case, then I’ll donate the money to an AI Safety organization instead. I’d be open to having this money be held in escrow.

I’m intending to award the prize to the top entry, but there’s a chance that I split it if I can’t make a decision.

  1. ^

    Counterpoint: requiring counterfactuals to justify their own use isn’t the same as counterfactuals only making sense from within themselves. Response: It’s possible to engage in the appropriate symbol manipulation without a concept of counterfactuals, but we can’t have a semantic understanding of what we’re doing. We can’t even describe this process without being to say things like “if given string of symbols s, do y”. Similarly, counterfactuals aren’t just justified by imagining the consequences of applying different mental over different circumstances, in this case, they are a system for performing well over a variety of circumstances.