Some thoughts on “The Nature of Counterfactuals”

This is a response to the post The Nature of Counterfactuals by Chris_Leong. My post might not make much sense without having read Chris_Leong’s post first.

I have been thinking a lot about causality. Since there is now a potential bounty on talking about causality, this seems like a good time to say some things.

TL;DR:

  • Counterfactuals are something we’ve made up, but causality isn’t

  • Evolution “sniffs out” something resembling true counterfactuals by rolling out many organisms with slightly different DNA, thereby getting directly influenced by the causality of reality

  • If you select agents in a causal universe according to their ability to achieve certain consequences, as evolution does, then the optimal agents will use causal reasoning like Causal Decision Theory; presumably this is why humans do causal reasoning

  • We, for similar reasons, expect to have causal AI

  • Causality isn’t just about decisions; it can cover many other things too

Causality is real, counterfactuals are not

Our universe appears to be a causal universe. By this, I mean that it appears to be characterized by having some particular ‘starting state’, and evolving forwards as a dynamical process according to some causal laws. However, actually showing this takes some steps.

Correlation is not causation, but before we go to the causal aspect, let’s just consider the correlational aspect. Scientists have found that the universe to a good approximation follows physical laws that appear to be local[1]/​continuous and translation-invariant[2] with respect to time and space. It also appears that the laws are deterministic, allowing the future to be predicted from the past.[3]

But just because we have descriptive laws does not mean that these laws are actually causal generators; after all, they’re just correlations, so couldn’t there just as well have been a common cause, or something else like that which could have produced these results?

No; this is where entropy comes in. The laws of physics are reversible, such that if you rewind the universe, the same laws will still apply.[4] Counterintuitively,[5] they are also measure-preserving, so that whenever we have some information about some specific time, such as how the state of the world is now, then we have equally much information about the past and future. However, in practice, the state of a system that is not continually replenished becomes less predictable over time[6], and if we look around in the universe, this appears to imply that everything will eventually disintegrate into pure noise.

So imagine if the universe was not determined by ordinary forward-time causality, if the universe did not start with a low-entropy state that then evolved lawfully forwards over time. What else could it be? Well, given the time-reversibility of physics, we might imagine that it would have started with an ending state and evolved backwards—but due to entropy, that is near impossible, because the overwhelming majority of ending states do not have a corresponding low-entropy starting state. The universe would have needed to somehow select an ending state that had a corresponding low-entropy starting state, but how could it do so except by (causally) starting with a low-entropy state?

Or maybe we wouldn’t imagine time running backwards, because there’s also the possibility of common-cause causality, where a common factor caused both the far past and the far future. But the laws of physics are chaotic; a tiny change to the initial state would imply huge differences to later states. This is hard for common causes to capture, because it seems like they would need to have the entire trajectory of the universe pre-coded, which just raises the question of where the trajectory encoded by the common cause came from.

But who says it has to have any element of causality? For instance, why not just say that the universe’s trajectory was determined randomly, sampling from any universe trajectory that happens to follow the laws of physics? This is where entropy comes back; the vast majority of universe trajectories do not have a consistent progression in entropy, unlike ours.

Essentially, the trajectory of the universe appears to have a rich structure which is very compatible with having been generated by a specific causal means, but which is not very compatible with alternative causal or non-causal explanations.

Finally, while causality definitely appears to be real, counterfactuals seem a lot more iffy. I would follow the Pearlian approach, which starts out with some causal/​dynamical system, and then defines counterfactuals to be made-up/​”mutilated” versions of that system. The reason we use counterfactuals in the Pearlian paradigm is because they are a convenient interface for “querying” the aggregated properties of causality.

Decision counterfactuals vs theory counterfactuals

When talking about counterfactuals, LessWrong tends to focus on a subset of counterfactuals that I would call “decision counterfactuals”. These are counterfactuals with respect to your own actions; e.g. “if I take a drink from my tea, then...”. However, this is not the only sort of counterfactuals that is conceivable. You can also consider counterfactuals with respect to variables in the world, e.g. “if this battery was always fully charged, then...”.

I believe these counterfactuals to be useful in their own ways, e.g.:

  • Getting a gears-level understanding of the world. The universe seems to be built out of modular causal pieces. If one wants to understand the world, then it seems likely inefficient to memorize the input/​output relationships of your sense-data; instead, it seems likely much more efficient to learn the causal nature of the different pieces of the world, and reason about the world by putting these pieces together.

  • AI alignment and corrigibility. For instance, humans value things like “freedom”. But like free will, freedom seems to me to be a counterfactual concept. A big part of what it means to be free is that you can influence things, which is a causal concept and thus requires counterfactuals to define. These counterfactuals are taken with respect to yourself, which from your perspective makes them decision counterfactuals. But for an AI to respect your freedom, it must evaluate the counterfactuals from its own perspective, even though the counterfactuals are with respect to you, and this means that they are not decision counterfactuals from the AI’s perspective.

  • Natural abstraction. If you want to define whether something is poisonous in terms of its chemical structure, you’re going to have a hard time. However, if you define it by how it would change a human’s health if a human ingested it, you’d have a much easier time.

Some of the special properties of counterfactuals don’t seem to hold for decision counterfactuals. For instance, conditionals are not the same as counterfactuals in general, but when it comes to decision counterfactuals, they often are. In fact, it’s been argued that conditionals should always match decision counterfactuals, though I’m not sure whether I buy that. Regardless of the relationship between decision counterfactuals and conditionals though, it’s definitely the case that most counterfactuals are not the same as conditionals, for standard “correlation != causation” reasons.

I will be focusing on decision counterfactuals in this post, because this seems to be what Chris_Leong is mostly talking about. But it should be noted that the counterfactuals I have been thinking the most about are more general counterfactuals, that aren’t necessarily directly related to decisions. Quite probably, these general counterfactuals also need some more theory to ground them.

Decision counterfactuals require a Cartesian frame

I evaluate that it’s possible for me to drink the tea that is on my table right now. But if some other agent evaluated that it was possible for them to drink my tea that is on my table right now, I would say that this agent has a problem with their decision counterfactuals, because the agent is not present in my room and therefore cannot drink my tea.

When I decide what is possible for me, I draw a boundary between myself and the world, and then mentally vary what I do within the boundary, and cross the effects over the boundary into the world. But which boundary to use depends on who I am. (And on where I am, when I am, etc..)

That said, I can draw a Cartesian boundary around other agents and evaluate what things they could possibly do. Though actually here there are a number of different boundaries one could draw; the boundary Chris_Leong suggests for justifying counterfactuals involves allowing me to entirely replace the other agents (or at least, replace their Cartesian frame), but this is not the sort of Cartesian frame I draw around myself (instead I imagine various actions/​plans I could take), and it’s not the sort of Cartesian frame that would be practical to apply to other humans (where instead we should take their personalities, goals and abilities into account).

One place where Chris_Leong’s Cartesian frame is useful is for agent design—things like AI research, evolutionary psychology/​biology, etc.. For instance, when designing a transformative artificial intelligence, there is a very natural Cartesian frame to be drawn around the deployment of the AI; we can imagine substituting the code for the AI with any other piece of code, and it is extremely relevant to reason about what would happen if we did so. In a sense, we would treat the world as a function of the source code for the AI we deploy.

An optimality proof for Causal Decision Theory

Chris_Leong argues for a circular justification for causality:

You might think that the circularity is a problem, but circular epistemology turns out to be viable (see Eliezer’s Where Recursive Justification Hits Bottom). And while circular reasoning is less than ideal, if the comparative is eventually hitting a point where we can provide no justification at all, then circular justification might not seem so bad after all.

Perhaps some circularity is needed, but I think there is a part of the argument that can be done in a less circular way. In order to illustrate this, let’s consider the case of deploying an AI.

When deploying the AI, we might treat the future trajectory of the world as being a function of the code of the AI we are deploying. This gives us a decision problem; what code should we enter into the computer to achieve the outcomes we want?

Notice, importantly, that this is a different decision problem than any decision problem the AI ever faces. Usually the AI will face decision problems about what actions to take, essentially about what its output stream should be.[7]

But under some simplifying assumptions, we can prove that an optimal program must act as if it follows Causal Decision Theory. More specifically, if is as if it was generated by some underlying dynamical/​causal process, with the code running in a sufficiently isolated way, and we have two putative AIs and , whose I/​O behavior differs only in that follows Causal Decision Theory in more of its decisions, then in expectation will outperform . Proof sketch:

Suppose specifies a world where the AI solely influences the world through the execution of the program, given some inputs. Suppose further that the outputs of and differ only in that follows Causal Decision Theory in some contexts where does not follow Causal Decision Theory. In that case, since Causal Decision Theory selects the actions which lead to the highest expected utility, the expected utility of deploying exceeds that of deploying .[8]

Thus, a “narrow” Cartesian frame justifies a “broad” Cartesian frame; if you believe that you can influence the world through selecting a successor agent, then you also believe that this successor agent should reason causally.

Though critically, it depends on three assumptions:

  • Cartesian boundary—if the AI’s code does not just interact with the world through the I/​O channel of the computer it is running on, but also through other means (e.g. maybe it’s up on a public GitHub repository), then it can face Newcomblike problems, where Causal Decision Theory fails. Therefore must act as if generated by a dualistic causal process.

  • Unbounded computation—applying Causal Decision Theory exactly requires enormous computation, so there might be no pairs that act the same except where applies more Causal Decision Theory than does, and there almost certainly no code that always applies Causal Decision Theory exactly.

  • Consequentialist preferences—the code must be selected according to preferences over , that is, according to preferences over the consequences of deploying the AI.

None of the above assumptions are realistic, but I think the Cartesian boundary assumption is the most interesting one to look at. At a first glance, it seems like it assumes the very strong forms of causality used by Causal Decision Theory. But that’s not quite true; the only actual causal thing it requires is that the deployed code influences the world according to the output of the function; however, while the function must act as if it is generated by a highly restrictive causal process, it doesn’t have to be that said causal process actually is the thing generating it.

(Though for something as advanced as the universe, I have a hard time seeing how we could make sense of it as anything but the result of a causal process; that is, the result of starting with the state of the universe close to the big bang, and iterating forwards according to the laws of physics.)

Now the more critical thing is that of course, agents are not perfectly dualistically separated in the way that the Cartesian boundary assumption requires (nor do they have unbounded computation, which has similar consequences). I don’t think this problem is fundamentally unfixable; rather it probably just changes the decision theory you end up with, so that it is not quite the same as Causal Decision Theory, but some variant that better deals with various sorts of leakage.

But the most important point here to me is that if you already have a Cartesian frame that corresponds to the causal structure of the universe in some way, allowing some counterfactuals, then this imposes a much broader Cartesian frame that you have to enter into successor agents that you build.

The Cartesian frame of evolution

The base of our human decision theory has come from evolution. And evolution is itself an agent-designing optimization process. Its Cartesian frame centers around genetics, as the way it figures out counterfactuals is by deploying endless forms of organisms into the universe, to see what happens; which ones reproduce?

Per the above argument, this Cartesian frame induces a corresponding Cartesian frame for humans. But evolution didn’t get its Cartesian frame from somewhere else; it’s just a fundamental property of the universe.[9]

In a sense, this gets rid of the need to use counterfactuals to justify counterfactuals by transforming it from prescriptive to descriptive. Rather than saying “agents do better by adopting causal decision processes”, you can simply say “agents which adopt causal decision processes replicate more and so become the more common kind”; whether the replication is because of them adopting causal decision processes or just a side-effect doesn’t change the descriptive fact that this is what is happening. However, as a causally oriented agent, I find the causal description more enlightening; perhaps here Chris_Leong’s circularity sneaks back a bit.

One important thing that evolution has done is that it has changed every part of you to reflect and handle the environment your ancestors faced. And as far as I can tell, your ancestors faced a causal environment, the same causal laws of physics that we face now; so this is probably the reason that we have counterfactuals.

Thanks to Justis Mills for proofreading and feedback.

  1. ^

    I.e. where each location is only dependent on an infinitesimal region around that location.

  2. ^

    I.e. the same laws can apply to all locations at once, just with different states.

  3. ^

    This isn’t quite true. There’s quantum mechanics, which shows that there isn’t determinism with respect to our usual embodied view, and there’s chaos theory which shows that even nominally deterministic laws are not so in practice. I think these effects are small enough that my argument still goes through, and many people also find it convincing that the problems can be patched (e.g. many worlds interpretation of quantum mechanics); but there could conceivably be a hole somewhere here.

  4. ^

    Strictly speaking, they are not wholly reversible; if you flip time, you need to also flip charge and parity; this is the CPT theorem. However, this doesn’t really change the point, and also under most conditions, this effect is negligible.

  5. ^

    This is contrary to everyday experience, where usually the future is hard to predict and records of the past may get lost; but the answer to the paradox lies in the fact that as the time distance increases, the information gets more “tangled up” with deeply nonlinear correlations. I like this diagram as an illustration for how it happens on a small scale, but on a large scale it is much more drastic. This creates a form of uncertainty that is impractical to think about or compute with, which one must necessarily simplify to a greater amount of more easily representable uncertainty; e.g. in the diagram, one could take the convex hull of the resulting shape.

  6. ^

    A simple example would be a heated house, that is hotter than the outside. There are many more ways that the temperatures inside and outside of the house can be similar than they can be different, and this is why the temperature of houses tend to equillibrate with the temperature outside, unless you have something that can replenish the temperature difference, such as a heater or a heat pump.

  7. ^

    Sometimes the AI wouldn’t face choices about its actions in an ordinary sense, but instead face choices about self-modification and such. These choices are quite similar to our choices about what AI to deploy. However, they are distinct in some ways; e.g. the AI would self-modify at a later time than we deploy it. This might seem like nitpicking, but from a mathematical perspective, that is strictly speaking a different decision problem with a different Cartesian frame, as the Cartesian frame also specifies the time at which you make your decision.

  8. ^

    Of course, this proof is technically compatible with the AI following a different decision theory than CDT which just happens to always give the same result as CDT would give.

  9. ^

    I think? Quantum mechanics seems to do a similar “deploy many parallel individuals and pick the optimal” thing, which is what allows Lagrangian mechanics to work. Is evolution in some way dependent on the principle of least action? I don’t think so but I might be wrong.