What I’m saying is that the only way to solve any decision theory problem is to learn a causal model from data.
I think there are a couple of confusions this sentence highlights.
First, there are approaches to solving decision theory problems that don’t use causal models. Part of what has made this conversation challenging is that there are several different ways to represent the world- and so even if CDT is the best / natural one, it needs to be distinguished from other approaches. EDT is not CDT in disguise; the two are distinct formulas / approaches.
Second, there are good reasons to modularize the components of the decision theory, so that you can treat learning a model from data separately from making a decision given a model. An algorithm to turn models into decisions should be able to operate on an arbitrary model, where it sees a → b → c as isomorphic to Drunk → Fall → Death.
To tell an anecdote, when my decision analysis professor would teach that subject to petroleum engineers, he quickly learned not to use petroleum examples. Say something like “suppose the probability of striking oil by drilling a well here is 40%” and an engineer’s hand will shoot up, asking “what kind of rock is it?”. The kind of rock is useful for determining whether or not the probability is 40% or something else, but the question totally misses the point of what the professor is trying to teach. The primary example he uses is choosing a location for a party subject to the uncertainty of the weather.
It just doesn’t make sense to postulate particular correlations between an EDT agent’s decisions and other things before you even know what EDT decides!
I’m not sure how to interpret this sentence.
The way EDT operates is to perform the following three steps for each possible action in turn:
Assume that I saw myself doing X.
Perform a Bayesian update on this new evidence.
Calculate and record my utility.
It then chooses the possible action which had the highest calculated utility.
One interpretation is you saying that EDT doesn’t make sense, but I’m not sure I agree with what seems to be the stated reason. It looks to me like you’re saying “it doesn’t make sense to assume that you do X until you know what you decide!”, when I think that does make sense, but the problem is using that assumption as Bayesian evidence as if it were an observation.
The way EDT operates is to perform the following three steps for each possible action in turn:
Assume that I saw myself doing X.
Perform a Bayesian update on this new evidence.
Calculate and record my utility.
Ideal Bayesian updates assume logical omniscience, right? Including knowledge about logical fact of what EDT would do for any given input. If you know that you are an EDT agent, and condition on all of your past observations and also on the fact that you do X, but X is not in fact what EDT does given those inputs, then as an ideal Bayesian you will know that you’re conditioning on something impossible. More generally, what update you perform in step 2 depends on EDT’s input-output map, thus making the definition circular.
So, is EDT really underspecified? Or are you supposed to search for a fixed point of the circular definition, if there is one? Or does it use some method other than Bayes for the hypothetical update? Or does an EDT agent really break if it ever finds out its own decision algorithm? Or did I totally misunderstand?
Ideal Bayesian updates assume logical omniscience, right? Including knowledge about logical fact of what EDT would do for any given input.
Note that step 1 is “Assume that I saw myself doing X,” not “Assume that EDT outputs X as the optimal action.” I believe that excludes any contradictions along those lines. Does logical omniscience preclude imagining counterfactual worlds?
If I already know “I am EDT”, then “I saw myself doing X” does imply “EDT outputs X as the optimal action”. Logical omniscience doesn’t preclude imagining counterfactual worlds, but imagining counterfactual worlds is a different operation than performing Bayesian updates. CDT constructs counterfactuals by severing some of the edges in its causal graph and then assuming certain values for the nodes that no longer have any causes. TDT does too, except with a different graph and a different choice of edges to sever.
I don’t know how I can fail to communicate so consistently.
Yes, you can technically apply “EDT” to any causal model or (more generally) joint probability distribution containing a “EDT agent decision” node. But in practice this freedom is useless, because to derive an accurate model you generally need to take account of a) the fact that the agent is using EDT and b) any observations the agent does or does not make. To be clear, the input EDT requires is a probabilistic model describing the EDT agent’s situation (not describing historical data of “similar” situations).
There are people here trying to argue against EDT by taking a model describing historical data (such as people following dumb decision theories jumping into volcanoes) and feeding this model directly into EDT. Which is simply wrong. A model that describes the historical behaviour of agents using some other decision theory does not in general accurately describe an EDT agent in the same situation.
The fact that this egregious mistake looks perfectly normal is an artifact of the fact that CDT doesn’t care about causal parents of the “CDT decision” node.
I don’t know how I can fail to communicate so consistently.
I suspect it’s because what you are referring to as “EDT” is not what experts in the field use that technical term to mean.
nsheppard-EDT is, as far as I can tell, the second half of CDT. Take a causal model and use the do() operator to create the manipulated subgraph that would result taking possible action (as an intervention). Determine the joint probability distribution from the manipulated subgraph. Condition on observing that action with the joint probability distribution, and calculate the probabilistically-weighted mean utility of the possible outcomes. This is isomorphic to CDT, and so referring to it as EDT leads to confusion.
I think there are a couple of confusions this sentence highlights.
First, there are approaches to solving decision theory problems that don’t use causal models. Part of what has made this conversation challenging is that there are several different ways to represent the world- and so even if CDT is the best / natural one, it needs to be distinguished from other approaches. EDT is not CDT in disguise; the two are distinct formulas / approaches.
Second, there are good reasons to modularize the components of the decision theory, so that you can treat learning a model from data separately from making a decision given a model. An algorithm to turn models into decisions should be able to operate on an arbitrary model, where it sees a → b → c as isomorphic to Drunk → Fall → Death.
To tell an anecdote, when my decision analysis professor would teach that subject to petroleum engineers, he quickly learned not to use petroleum examples. Say something like “suppose the probability of striking oil by drilling a well here is 40%” and an engineer’s hand will shoot up, asking “what kind of rock is it?”. The kind of rock is useful for determining whether or not the probability is 40% or something else, but the question totally misses the point of what the professor is trying to teach. The primary example he uses is choosing a location for a party subject to the uncertainty of the weather.
I’m not sure how to interpret this sentence.
The way EDT operates is to perform the following three steps for each possible action in turn:
Assume that I saw myself doing X.
Perform a Bayesian update on this new evidence.
Calculate and record my utility.
It then chooses the possible action which had the highest calculated utility.
One interpretation is you saying that EDT doesn’t make sense, but I’m not sure I agree with what seems to be the stated reason. It looks to me like you’re saying “it doesn’t make sense to assume that you do X until you know what you decide!”, when I think that does make sense, but the problem is using that assumption as Bayesian evidence as if it were an observation.
Ideal Bayesian updates assume logical omniscience, right? Including knowledge about logical fact of what EDT would do for any given input. If you know that you are an EDT agent, and condition on all of your past observations and also on the fact that you do X, but X is not in fact what EDT does given those inputs, then as an ideal Bayesian you will know that you’re conditioning on something impossible. More generally, what update you perform in step 2 depends on EDT’s input-output map, thus making the definition circular.
So, is EDT really underspecified? Or are you supposed to search for a fixed point of the circular definition, if there is one? Or does it use some method other than Bayes for the hypothetical update? Or does an EDT agent really break if it ever finds out its own decision algorithm? Or did I totally misunderstand?
Note that step 1 is “Assume that I saw myself doing X,” not “Assume that EDT outputs X as the optimal action.” I believe that excludes any contradictions along those lines. Does logical omniscience preclude imagining counterfactual worlds?
If I already know “I am EDT”, then “I saw myself doing X” does imply “EDT outputs X as the optimal action”. Logical omniscience doesn’t preclude imagining counterfactual worlds, but imagining counterfactual worlds is a different operation than performing Bayesian updates. CDT constructs counterfactuals by severing some of the edges in its causal graph and then assuming certain values for the nodes that no longer have any causes. TDT does too, except with a different graph and a different choice of edges to sever.
I don’t know how I can fail to communicate so consistently.
Yes, you can technically apply “EDT” to any causal model or (more generally) joint probability distribution containing a “EDT agent decision” node. But in practice this freedom is useless, because to derive an accurate model you generally need to take account of a) the fact that the agent is using EDT and b) any observations the agent does or does not make. To be clear, the input EDT requires is a probabilistic model describing the EDT agent’s situation (not describing historical data of “similar” situations).
There are people here trying to argue against EDT by taking a model describing historical data (such as people following dumb decision theories jumping into volcanoes) and feeding this model directly into EDT. Which is simply wrong. A model that describes the historical behaviour of agents using some other decision theory does not in general accurately describe an EDT agent in the same situation.
The fact that this egregious mistake looks perfectly normal is an artifact of the fact that CDT doesn’t care about causal parents of the “CDT decision” node.
I suspect it’s because what you are referring to as “EDT” is not what experts in the field use that technical term to mean.
nsheppard-EDT is, as far as I can tell, the second half of CDT. Take a causal model and use the do() operator to create the manipulated subgraph that would result taking possible action (as an intervention). Determine the joint probability distribution from the manipulated subgraph. Condition on observing that action with the joint probability distribution, and calculate the probabilistically-weighted mean utility of the possible outcomes. This is isomorphic to CDT, and so referring to it as EDT leads to confusion.
Whatever. I give up.