philh comments on Kelly betting vs expectation maximization

philh 31 May 2023 10:33 UTC
4 points
0
I agree infinity is what makes things go weird here, but as you say, not particularly weirder for Bob than for Kelly-Betting Bob (who also never leaves the casino, and also wraps in a while my_money > 0 loop).

But what you say here seems to undermine your original comment:

The problem with maximising expected utility is that Bob will sit their playing 1 more round, then another 1 more round again and again until he eventually looses everything.

But KBB also sits there playing one more round, then another round. He doesn’t eventually lose everything, but he doesn’t leave either. This isn’t a problem with maximizing expected utility, it’s a problem with infinity.

At least to me Kelly betting fits in the same kind of space as the Newcomb paradox and (possibly) the prisoners dilemma. They all demonstrate that the optimal policy is not necessarily given by a sequence of optimal actions at every step.

But with this setup, it only demonstrates that if we wave our hands and talk about what happens after playing infinitely many rounds of a game we never want to stop playing.

If we aren’t talking about something like that, then optimal policy for the expected-money maximizer is given by taking the optimal action at every step.
- Ben 31 May 2023 11:54 UTC
  4 points
  0
  Parent
  Yes, my position did indeed shift, as you changed my mind and I thought about it in more depth. My original position was very much pro-Kelly. On thinking about your points I now think it is the while my_money > 0 aspect where the problem really lies. I still stand by the difference between optimal global policy and optimal action at each step distinction, because at each step the optimal policy (for Kelly or not) is to shake the dice another time. But, if this is taken as a policy we arrive at the while my_money > 0 break condition being the only escape, which is clearly a bad policy. (It guarantees that in any world we walk away, we walk away with nothing.)
  - philh 31 May 2023 15:19 UTC
    4 points
    0
    Parent
    Nod. I think we basically agree at this point. Certainly I don’t intend to claim that optimal policy and optimal actions always coincide (I have more thoughts on that but don’t want to get into them).