Random variables and Evidential Decision Theory

This post is inspired by the recent discussion I had with IlyaShpitser and Vaniver on EDT.

A random variable only ever has one value

In probability theory, statistics and so on, we often use the notion of a random variable (RV). If you go look at the definition, you will see that a RV is a function of the sample space. What that means is that a RV assigns a value to each possible outcome of a system. In reality, where there are no closed systems, this means that a RV assigns a value to each possible universe.

For example, a random variable X representing the outcome a die roll is a function of type “Universe → {1..6}”. The value of X in a particular universe u is then X(u). Uncertainty in X corresponds to uncertainty about the universe we are in. Since X is a pure mathematical function, its value is fixed for each input. That means that in a fixed universe, say our universe, such a random variable only ever takes on one value.

So, before the die roll, the value of X is undefined1, and after the roll X is forever fixed. X is the outcome of a certain particular roll. If I roll the same die again, that doesn’t change the value of X. If you want to talk about multiple rolls, you have to use different variables. The usual solution is to use indices, X1, X2, etc.

This also means that the nodes in a causal model, are not random variables. For example in the causal model “Smoking → Cancer”, there is no single RV for smoking. Rather, the model is implicitly a generalized to mean “Smokingi → Canceri” for all persons i.

What this means for EDT

It is sometimes claimed that Evidential Decision Theory (EDT) can not deal with causal structure. But I would disagree. To avoid confusion, I will refer to my interpretation as Estimated Evidential Decision Theory (EEDT).

Decision theories such as (E)EDT rely on the following formula to make decisions:

where oj are the possible outcomes, U(oj) is the utility of an outcome, O is a random variable that represents the actual outcome, and a is an action. The (E)EDT policy is to take the action that maximizes V(a), the value of that action.

How would you evaluate this formula in practice? To do that, you need to know P(O=oj | a). I.e. the probability of a certain outcome given that you take a certain action. But keep in mind the previous section! There is only one random variable O, which is the outcome of this action. Without assuming some prior knowledge, O is unrelated to the outcome of other similar actions in similar situations.

At the time an agent has to decide what action a to take, the action has not happened yet, and the outcome is not yet known to him. This means that the agent has no observations of O. The agent therefore has to estimate P(O=oj|a) by using only his prior knowledge. How this estimation is done exactly is not specified by EEDT. If the agent wants to use a causal model, he is perfectly free to do so!

You might argue that by not specifying how the conditional probabilities P(O=oj|a) are calculated, I have taken out the interesting part of the decision theory. With the right choice of estimation procedure, EEDT can describe CDT, normal/​naive EDT, and even UDT2. But EEDT is not so general as to be completely useless. What it does give you is a way to reduce the problem of making decisions to that of estimating conditional probabilities.


Footnotes

1. Technically, ‘undefined’ is not in the domain of X. What I mean is that X is a partial function of universes, or a function only of universes in which the die has been rolled.

2. To get CDT, assume there is a causal model for A → O, and use that to estimate P(O=oj | do A=a). To get naive EDT, estimate the probabilities from data without taking causality or confounders into account. To get UDT, model A as being the choice of all sufficiently similar agents, not just yourself.