I get that. What I’m really wondering is how this extends to probabilistic reasoning. I can think of an obvious analog. If the algorithm assigns zero probability that it will choose $5, then when it explores the counterfactual hypothesis “I choose $5”, it gets nonsense when it tries to condition on the hypothesis. That is, for all U,
A causal reasoner will compute about P(utility=U| do{action=$5}), which doesn’t run into this trouble. This is the approach I recommend.
Probabilistic reasoning about actions that you will make is, to the best of my knowledge, not a seriously considered approach to making decisions outside of the context of mixed strategies in game theory, and even there it doesn’t apply that strong, as you can see mixed strategies as putting forth a certain (but parameterized) action whose outcome is subject to uncertainty.
I don’t think your sketch is correct for two reasons:
The assumption that your action is utility-maximizing requires that you choose the best action, and so using it to justify your choice of action leads to circularity.
Your argument hinges on P(U($10)>U($5)|A=$10) > P(U($5)>U($10)|A=$5), which seems like an odd statement to me. If you take the actions maximize utility assumption seriously, both of those are 1, and thus the first can’t be higher than the second. If you view the actions as not at all informative about the preference probabilities, then you’re just repeating your prior. If the action gives some information, there’s no reason for the information to be symmetric- you can easily construct a 2x2 matrix example where the reverse inequality holds (that if we know they picked $5, they are more likely to prefer $5 to $10 than someone who picked $10 is to prefer $10 to $5, even though most people prefer $10 to $5.
What I am saying is that I don’t assume that I maximize expected utility. I take the five-and-ten problem as a proof that an agent cannot be certain that it will make the optimal choice, while it is choosing, because this leads to a contradiction. But this doesn’t mean that I can’t use the evidence that a choice would represent, while choosing. In this case, I can tell that U($10) > U($5) directly, so conditioning on A=$10 or A=$5 is redundant. The point is that it doesn’t cause the algorithm to blow up, as long I don’t think my probability of maximizing utility is 0 or 1.
It’s true that A=$5 could be stronger evidence for U($5)>U($10) than A=$10 is for U($10)>U($5). But there’s no particular reason to think it would be. And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference. As long as making a choice is evidence for that being the optimal choice, only insofar as I am confident that I make the optimal choice in general, it will provide equally strong evidence for every choice, and cancel itself out. But in cases where a particular choice is evidence of good things for other reasons (like Newcomb’s problem), taking this evidence into consideration can affect my decision.
So why can’t I just use the knowledge that I’ll go through this line of reasoning to prove that I will choose $10 and yield a contradiction? Because I can’t prove that I’ll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion. Now, there may be a shortcut I could use to prove what my choice will be, but the very fact that this would yield a contradiction means that no such proof exists in a consistent formal system.
(BTW, I agree that CDT is the only decision theory that works in practice, as is. I’m only addressing one issue with the various timeless decision theories)
And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference.
Well, then why even update? (Or, more specifically, why assume that this is harmless normally, but an ace up your sleeve for a particular class of problems? You need to be able to reliably distinguish when this helps you and when this hurts you from the inside, which seems difficult.)
Because I can’t prove that I’ll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion.
I’m not sure that I understand this; I’m under the impression that many TDT applications require that they be able to simulate themselves (and other TDT reasoners) this way.
Good questions. I don’t know the answers. But like you say, UDT especially is basically defined circularly—where the agent’s decision is a function of itself. Making this coherent is still an unsolved problem. So I was wondering if we could get around some of the paradoxes by giving up on certainty.
I get that. What I’m really wondering is how this extends to probabilistic reasoning. I can think of an obvious analog. If the algorithm assigns zero probability that it will choose $5, then when it explores the counterfactual hypothesis “I choose $5”, it gets nonsense when it tries to condition on the hypothesis. That is, for all U,
P(utility=U | action=$5) = P(utility=U and action=$5) / P(action=$5) = 0⁄0
is undefined. But is there an analog for this problem under uncertainty, or was my sketch correct about how that would work out?
A causal reasoner will compute about P(utility=U| do{action=$5}), which doesn’t run into this trouble. This is the approach I recommend.
Probabilistic reasoning about actions that you will make is, to the best of my knowledge, not a seriously considered approach to making decisions outside of the context of mixed strategies in game theory, and even there it doesn’t apply that strong, as you can see mixed strategies as putting forth a certain (but parameterized) action whose outcome is subject to uncertainty.
I don’t think your sketch is correct for two reasons:
The assumption that your action is utility-maximizing requires that you choose the best action, and so using it to justify your choice of action leads to circularity.
Your argument hinges on P(U($10)>U($5)|A=$10) > P(U($5)>U($10)|A=$5), which seems like an odd statement to me. If you take the actions maximize utility assumption seriously, both of those are 1, and thus the first can’t be higher than the second. If you view the actions as not at all informative about the preference probabilities, then you’re just repeating your prior. If the action gives some information, there’s no reason for the information to be symmetric- you can easily construct a 2x2 matrix example where the reverse inequality holds (that if we know they picked $5, they are more likely to prefer $5 to $10 than someone who picked $10 is to prefer $10 to $5, even though most people prefer $10 to $5.
What I am saying is that I don’t assume that I maximize expected utility. I take the five-and-ten problem as a proof that an agent cannot be certain that it will make the optimal choice, while it is choosing, because this leads to a contradiction. But this doesn’t mean that I can’t use the evidence that a choice would represent, while choosing. In this case, I can tell that U($10) > U($5) directly, so conditioning on A=$10 or A=$5 is redundant. The point is that it doesn’t cause the algorithm to blow up, as long I don’t think my probability of maximizing utility is 0 or 1.
It’s true that A=$5 could be stronger evidence for U($5)>U($10) than A=$10 is for U($10)>U($5). But there’s no particular reason to think it would be. And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference. As long as making a choice is evidence for that being the optimal choice, only insofar as I am confident that I make the optimal choice in general, it will provide equally strong evidence for every choice, and cancel itself out. But in cases where a particular choice is evidence of good things for other reasons (like Newcomb’s problem), taking this evidence into consideration can affect my decision.
So why can’t I just use the knowledge that I’ll go through this line of reasoning to prove that I will choose $10 and yield a contradiction? Because I can’t prove that I’ll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion. Now, there may be a shortcut I could use to prove what my choice will be, but the very fact that this would yield a contradiction means that no such proof exists in a consistent formal system.
(BTW, I agree that CDT is the only decision theory that works in practice, as is. I’m only addressing one issue with the various timeless decision theories)
Well, then why even update? (Or, more specifically, why assume that this is harmless normally, but an ace up your sleeve for a particular class of problems? You need to be able to reliably distinguish when this helps you and when this hurts you from the inside, which seems difficult.)
I’m not sure that I understand this; I’m under the impression that many TDT applications require that they be able to simulate themselves (and other TDT reasoners) this way.
Good questions. I don’t know the answers. But like you say, UDT especially is basically defined circularly—where the agent’s decision is a function of itself. Making this coherent is still an unsolved problem. So I was wondering if we could get around some of the paradoxes by giving up on certainty.