It’s a good idea to look at Wei’s posts, of course, but in terms of presentation, the original UDT post is a very long way away from mine, and it won’t immediately be evident why I phrased my definition of UDT as I did.
If you want to understand my post purely on its own terms, then the key concept (besides probability and conditional probability) is just that of a game. If we have a one-player game, and we fix the player’s strategy, then we obtain a probability distribution over ‘branches’, and a utility lying at the end of each branch. And these are exactly the ingredients we need to calculate an expected utility. So UDT is simply the instruction ‘choose the strategy that yields the greatest expected utility’. The reason why it’s “updateless” is that the probability distribution with respect to which we’re calculating expected utilities is the ‘prior’ rather than ‘posterior’ - we haven’t ‘conditioned on’ the subset of branches that pass through a particular information state.
For each of Newcomb’s Problem, Parfit’s Hitchhiker, Counterfactual Mugging and the Absent-Minded Driver, there is a sense in which when you ‘condition on the blue box’ you choose a different strategy than when you don’t. (This is paradoxical because, intuitively, what you ought to decide to do at a given time shouldn’t depend on whether you’re contemplating the decision from afar, timelessly, or actually there ‘in the moment’.)
(Technical Note: The concept of ‘conditioning on the blue box’ can be a bit more complicated than just ‘conditioning on an event’. For instance, in the case of Newcomb’s problem, you find that one-boxing is optimal if you don’t condition on anything, but two-boxing is optimal if you condition on the sigma-algebra generated by the event ‘predictor predicts that you will one-box’.)
It’s a good idea to look at Wei’s posts, of course, but in terms of presentation, the original UDT post is a very long way away from mine, and it won’t immediately be evident why I phrased my definition of UDT as I did.
If you want to understand my post purely on its own terms, then the key concept (besides probability and conditional probability) is just that of a game. If we have a one-player game, and we fix the player’s strategy, then we obtain a probability distribution over ‘branches’, and a utility lying at the end of each branch. And these are exactly the ingredients we need to calculate an expected utility. So UDT is simply the instruction ‘choose the strategy that yields the greatest expected utility’. The reason why it’s “updateless” is that the probability distribution with respect to which we’re calculating expected utilities is the ‘prior’ rather than ‘posterior’ - we haven’t ‘conditioned on’ the subset of branches that pass through a particular information state.
For each of Newcomb’s Problem, Parfit’s Hitchhiker, Counterfactual Mugging and the Absent-Minded Driver, there is a sense in which when you ‘condition on the blue box’ you choose a different strategy than when you don’t. (This is paradoxical because, intuitively, what you ought to decide to do at a given time shouldn’t depend on whether you’re contemplating the decision from afar, timelessly, or actually there ‘in the moment’.)
(Technical Note: The concept of ‘conditioning on the blue box’ can be a bit more complicated than just ‘conditioning on an event’. For instance, in the case of Newcomb’s problem, you find that one-boxing is optimal if you don’t condition on anything, but two-boxing is optimal if you condition on the sigma-algebra generated by the event ‘predictor predicts that you will one-box’.)