aaronde comments on Open Thread, November 16–30, 2012

aaronde 25 Nov 2012 5:12 UTC
0 points
0
What I am saying is that I don’t assume that I maximize expected utility. I take the five-and-ten problem as a proof that an agent cannot be certain that it will make the optimal choice, while it is choosing, because this leads to a contradiction. But this doesn’t mean that I can’t use the evidence that a choice would represent, while choosing. In this case, I can tell that U($10) > U($5) directly, so conditioning on A=$10 or A=$5 is redundant. The point is that it doesn’t cause the algorithm to blow up, as long I don’t think my probability of maximizing utility is 0 or 1.

It’s true that A=$5 could be stronger evidence for U($5)>U($10) than A=$10 is for U($10)>U($5). But there’s no particular reason to think it would be. And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference. As long as making a choice is evidence for that being the optimal choice, only insofar as I am confident that I make the optimal choice in general, it will provide equally strong evidence for every choice, and cancel itself out. But in cases where a particular choice is evidence of good things for other reasons (like Newcomb’s problem), taking this evidence into consideration can affect my decision.

So why can’t I just use the knowledge that I’ll go through this line of reasoning to prove that I will choose $10 and yield a contradiction? Because I can’t prove that I’ll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion. Now, there may be a shortcut I could use to prove what my choice will be, but the very fact that this would yield a contradiction means that no such proof exists in a consistent formal system.

(BTW, I agree that CDT is the only decision theory that works in practice, as is. I’m only addressing one issue with the various timeless decision theories)
- Vaniver 25 Nov 2012 6:57 UTC
  1 point
  0
  Parent
  
  And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference.
  
  Well, then why even update? (Or, more specifically, why assume that this is harmless normally, but an ace up your sleeve for a particular class of problems? You need to be able to reliably distinguish when this helps you and when this hurts you from the inside, which seems difficult.)
  
  Because I can’t prove that I’ll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion.
  
  I’m not sure that I understand this; I’m under the impression that many TDT applications require that they be able to simulate themselves (and other TDT reasoners) this way.
  - aaronde 25 Nov 2012 23:44 UTC
    0 points
    0
    Parent
    Good questions. I don’t know the answers. But like you say, UDT especially is basically defined circularly—where the agent’s decision is a function of itself. Making this coherent is still an unsolved problem. So I was wondering if we could get around some of the paradoxes by giving up on certainty.