Counterfactuals from ensembles of peers

Bruno De Finetti once argued that probabilities are subjective representations of forecasts of real events. In particular, he rejected the idea that statements like “there is a random probability of this coin landing on heads when it is tossed, distributed according to ”. For De Finetti, probabilities were subjective and had to refer to real events. He famously proved a representation theorem: given a subjective probability that is symmetric to permutation, we can prove the existence a random variable that represents to the “probability of each event”. This theorem justified a common but metaphysically mysterious construction—random variables that represent “the probability of something”—by showing that they can be derived from random variables that represent actual events, if those random variables are exchangeable. This concern is similar (but not identical) to Jaynes’ concern with the mind projection fallacy—roughly, if probabilities are subjective things, we should be careful not to start treating them as objective things.

Counterfactuals, I think, have some similarities to “probabilities of coins”. In particular, I have about as much difficulty understanding what “the probability this coin lands on heads” actually means as I do understanding “the counterfactual outcome of xxx is yyy” means.

Chris says counterfactuals “only make sense from within the counterfactual perspective”. One interpretation of this is: we want a theory of what makes a counterfactual statement true or useful or predictive, and such a theory can only be constructed starting with the assumption that some other counterfactual statements are true.

I think one disanalogy between probability models and counterfactuals is that there are many reasonable attempts to propose abstract and widely applicable principles for constructing good probability models. Some examples of this are principles of coherence imply that subjective uncertainty should be represented by probability models and Solomonoff or max entropy priors.

On the other hand, there are few abstract and widely applicable theories of what makes a counterfactual hypothesis appropriate. Commonly, we rely on specific counterfactual intuitions: for example, randomization guarantees that potential outcomes are independent of treatment assignment. As I understand, David Lewis’ theory is one exception to this rule—it posits that we can evaluate the appropriateness of a counterfactual hypothesis by reference to a measure of “how similar” different “possible worlds” are. I am not an expert in Lewis’ work, but this attempt does not appear to me to have borne a lot of fruit.

I’m going to propose a theory of counterfactuals that says counterfactual hypotheses should be evaluated by reference to a set of similar agents. Like Lewis’ theory, this requires that we have some means of determining which agents are similar, and my theory has some serious gaps in this regard. It is somewhat similar to De Finetti’s strategy in that we take something which seems impossible to define for a single event (counterfactual hypotheses and probabilities of probabilities respectively) and instead define them with regard to sequences of events that are symmetric in an appropriate sense.

I’m going to say that an “counterfactual hypothesis” is a function from a set of actions A to a set of consequences . An agent equipped with such a function where matches the actions they actually have available could use it to evaluate the prospects of each action (which may or may not be a wise choice). If we translate the evaluation process to English, it might consist of statements like “if I do , the result will be ”. This is “counterfactual” in the sense that the antecedent is in fact true only for one (whether I know it or not) and false otherwise. Despite this, we want it to somehow say something meaningful for every , so I can use it to choose good actions instead of bad ones.

If you have read a lot of Pearl’s work, you might object that these things shouldn’t actually be called counterfactual because we reserve that term for things like except where we already know which action we actually too and which consequence we actually observed. My response is: the two things are obviously quite similar, and I think it’s reasonable to start with a simpler problem.

I avoid probability as much as I can, because it makes things more complicated. The result is I make a bunch of completely unreasonable definitions about things being deterministically equal. I’m sorry.

Proposition: counterfactuals can be defined with respect to ensembles of peers

An agent is something that takes one action and experiences one consequence. We don’t dwell too much about what actions and consequences are; we’ll just consider them to be random variables. So we can think of an agent as just a pair of an action and a consequence.

A set of peers is a set of agents such that all agents that take the same action experience the same consequence.

We can the define a counterfactual function :

(D1): a counterfactual appropriate for agent is the map from actions to consequences defined by a set of peers of

An agent reasoning using this counterfactual function might say in English: “if I or an agent essentially identical to me does , the result will be .”

D1 is vague and nonunique. This seems problematic, but maybe it can still be useful. Even with the nonuniqueness, it may in some circumstances have nontrivial consequences. Consider the assumption

(A1): has a set of peers

D1 + A1 implies that agent must expect consequences that some other agent has experienced or will experience. D1 + A1 is unreasonably strong, but we might imagine more reasonable versions if we can come up with good notions of approximate peers, and approximately the same consequences.

Furthermore, given the consequences experienced by an agent in for each of its available actions, then given whatever action agent takes we can say which consequence it will experience.

So far, this doesn’t seem too far from a dumb way of defining arbitrary functions which, for whatever reason, is restricted to functions that have already been realised by some collection of agents.

It becomes non-trivial if we suppose that it is possible to know some set of agents is a set of peers before it is possible to know .

Here’s a sketch of how this could proceed:

We have a small set of peers of known a priori, and we observe that on all the actions taken by agents in , the consequence are the same as those experienced by the larger set of agents , with each having its own small set of a priori peers. Then we take the union of all s and posit that is also a peer set for . Importantly, has some actions that aren’t in the original , so we get some extrapolation to unseen actions.

could be, for example, a collection of actions all taken by agent , where we have unusally strong reasons to believe the consequences are consistent.

In reality, we would want to deal with stochastic consequences, and some kind of regularisation on allowable candidates for (i.e. if we iterate through the power set of agents, we’ll end up over-fitting). This is just a toy theory, and I don’t know if it works or not.

Aside: evolution and counterfactuals

Chris makes the somewhat offhanded remark

However, it would be inaccurate to present [counterfactuals] as purely a human invention as we were shaped by evolution in such a way as to ground these conceptions in reality.

If we there is some weaker notion of peer which corresponds roughly with an agent’s competitors in an evolutionary environment, we divide agents into generations and the consequence an agent in generation experiences is the number of agents that make choices the same way it does in generation , then agents that make choices that maximise this number according to D1-like counterfactuals will be represented more in the next generation.

If there exists a unique procedure that, in all generations, learns a D1-like counterfactual and makes choices maximising descendents according to the learned rule, then agents implementing this procedure will become the most numerous.

Decision theoretic paradoxes

Decision theoretic paradoxes can be understood to be asking: which evaluative counterfactuals are appropriate in this situation? A1 and D1 don’t really tell us this in the classic paradoxes. Consider the following version of Newcomb’s problem:

  • We’ve just watched big mobs of people pick, some chose one box and some chose two box

  • One-boxers always got $1m, two-boxers always got $100

  • We know the predictions were made before seeing the actions, and the rules were followed (both boxes filled if one box predicted, $1m box empty otherwise)

If we accept D1, then the problem is one of identifying a class of peers whose choices and consequences we can use to evaluate our own prospects. Some options are:

  1. None of them are peers (reject A1)

  2. If we one box then one-boxers are peers, if we two box then some two-boxers are

  3. One-boxers and two-boxers are peers

  4. Two-boxers are peers but not one-boxers

  5. Option 4 with one-box and two-box reversed

Options 1 leaves the appropriate counterfactual underdetermined—we do not see the consequences for one or both actions for any class of essentially identical agents. Option 3 implies one-boxing.

Can we say anything about which options are more reasonable, and is thinking about it at all enlightening?

Option 1 seems excessively skeptical to me. Appealing to the idea of learning peers: usually whatever consequences I experience for taking an action, there’s someone else who’s tried it first and had the same experience.

Option 2 is equivalent to option 3. Why? Option 2 says: if we one box, we get the same consequence as the one-box subset ($1m) and if we two box we get the same consequence as the two-box subset ($100). By definition of peers they’re all our peers.

What about accepting option 4 and then going on to choose one box? It would be quite surprising if we learned that agents that tend to take the opposite action to us are the ones that get the same results.

So it seems like the reasonable advice, acccepting D1, is to one-box. Does this mean D1 is equivalent to EDT? I think EDT broadly defined could be many things—who even knows what “the conditional probability” is, anyway? However, maybe this could be associated with a kind of frequentist EDT that defines probabilities via relative frequencies in populations of peers (maybe).

Mundane causation vs correlation

Does this definition handle mundane cases of causation and correlation? It seems like it might. Suppose we’re a doctor in a clinic seeing a patient who is sick and we’re deciding whether to make a prescription. We’ve seen a large collection of other doctor-patient data from regular consultations (i.e. non-randomised).

The reasoning for Newcomb’s problem can be applied analogously to suggest that if 100% of patients who received the prescription recover and 100% of those who do not don’t recover, we should probably prescribe. I suspect that many would accept this in practice, too. The cases where it goes wrong seem to be some flavour of “everyone is trying to trick me”, and if that is plausible then the excessively skeptical option 1 from the Newcomb’s example seems somewhat reasonable in response.

What if, say, 75% of the treated recover and 25% of the untreated do with 50% treated and 50% not? One way you can approach this is to notice that a class of peers is formally identical to a class of agents with the same potential outcome function, so you can basically do whatever potential outcomes practitioners do.

If peers are learnable, we might be able to say something positive about which bits of data form an appropriate comparison set. Perhaps this gets us to some special case of data fusion.


The notion of defining counterfactuals via “actually existing ensembles of peers” has a lot of the same problems regular counterfactuals have. In our toy theory, there’s a peer set that contains any two agents that took different actions, which is similar to the problem with naive logical counterfactuals that, when given a false premise, tell us that any consequence is possible. I think this the central difficulty: coming up with a notion of “peer” that allows some agents that take different actions to be peers, but also doesn’t allow any agents that take different actions to be peers.

The difficulties with peer counterfactuals aren’t quite identical to naive logical counterfactuals, though: peer counterfactuals are restricted to consequences that have already been realised. I suppose the obvious strengthening of this idea is -peer counterfactuals, where permissible consequences must be experienced in at least of all cases, with . This translates to the idea “I’ll get the same results as everyone else who tried the same thing”. This is actually a more precise statement of what people are usually rejecting when they say “causation correlation”.