Meditation: So far, we’ve always pretended that you only face one choice, at one point in time. But not only is there a way to apply our theory to repeated interactions with the environment — there are two!
One way is to say that at each point in time, you should apply decision theory to set of actions you can perform at that point. Now, the actual outcome depends of course not only on what you do now, but also on what you do later; but you know that you’ll still use decision theory later, so you can foresee what you will do in any possible future situation, and take it into account when computing what action you should choose now.
The second way is to make a choice only once, not between the actions you can take at that point in time, but between complete plans — giant lookup tables — which specify how you will behave in any situation you might possibly face. Thus, you simply do your expected utility calculation once, and then stick with the plan you have decided on.
Which of these is the right thing to do, if you have a perfect Bayesian genie and you want steer the future in some particular direction? (Does it even make a difference which one you use?)
“Apply decision theory to the set of actions you can perform at that point” is underspecified — are you computing counterfactuals the way CDT does, or EDT, TDT, etc?
This question sounds like a fuzzier way of asking which decision theory to use, but maybe I’ve missed the point.
There is no distinction between these. How do you construct this hypothetical lookup table? By applying decision theory to every possible future history. In other words, by applying option 1 to calculate out everything in advance. But why bother? Applying option 1 as events unfold will produce results identical to applying it to all possible futures now, and avoids the small problem of requiring vastly more computational resources than the universe is capable of holding, running extraordinarily faster than anything is capable of happening, and operating for gigantically longer than the universe will exist, before you can do anything.
I’m not convinced that the absentminded driver problem has such implications. Its straightforward (to me) resolution is that the optimal p is 2⁄3 by the obvious analysis, and that the driver cannot use alpha as a probability, for reasons set out here.
But I’d rather not get into a discussion of self-referential decision theory, since it doesn’t currently exist.
Meditation: So far, we’ve always pretended that you only face one choice, at one point in time. But not only is there a way to apply our theory to repeated interactions with the environment — there are two!
One way is to say that at each point in time, you should apply decision theory to set of actions you can perform at that point. Now, the actual outcome depends of course not only on what you do now, but also on what you do later; but you know that you’ll still use decision theory later, so you can foresee what you will do in any possible future situation, and take it into account when computing what action you should choose now.
The second way is to make a choice only once, not between the actions you can take at that point in time, but between complete plans — giant lookup tables — which specify how you will behave in any situation you might possibly face. Thus, you simply do your expected utility calculation once, and then stick with the plan you have decided on.
Which of these is the right thing to do, if you have a perfect Bayesian genie and you want steer the future in some particular direction? (Does it even make a difference which one you use?)
“Apply decision theory to the set of actions you can perform at that point” is underspecified — are you computing counterfactuals the way CDT does, or EDT, TDT, etc?
This question sounds like a fuzzier way of asking which decision theory to use, but maybe I’ve missed the point.
I really like this trend of adding meditations to posts, asking people to figure something out not just on their own but here and out loud.
Does it matter if your utility function is constant with respect to time, provided that the most preferred outcome changes rarely?
There is no distinction between these. How do you construct this hypothetical lookup table? By applying decision theory to every possible future history. In other words, by applying option 1 to calculate out everything in advance. But why bother? Applying option 1 as events unfold will produce results identical to applying it to all possible futures now, and avoids the small problem of requiring vastly more computational resources than the universe is capable of holding, running extraordinarily faster than anything is capable of happening, and operating for gigantically longer than the universe will exist, before you can do anything.
Calculating the locally optimal action without any reference to plans can sometimes get you different results—see the absentminded driver problem.
I’m not convinced that the absentminded driver problem has such implications. Its straightforward (to me) resolution is that the optimal p is 2⁄3 by the obvious analysis, and that the driver cannot use alpha as a probability, for reasons set out here.
But I’d rather not get into a discussion of self-referential decision theory, since it doesn’t currently exist.