Right, this seems really good! (It deserves more upvotes than it has; it’s suitably mind blowing ;p)
There are some arbitrary choices (this is a generalization of expectation maximization, not an argument from expectation maximization, so it’s not surprising that it’s not a unique solution), but the only really arbitrary seeming port is the choice about how to order the limits. And as discussed in the other comment thread here, Eigil’s comment about lim(EV) vs EV(lim) makes your choice of ordering seem like the more appropriate one—your version matches up with the fact that EV of the all-in strategy for the infinite game is zero, while allowing us to evaluate strategies for the cases where EV of the infinite game is not well-defined.
The alternate intuition is that, since EV of infinite games is problematic, we should just compare the EV of strategies on very large numbers of iterations. This is basically your alternate limit ordering, and Eigil’s lim(EV) as opposed to EV(lim). And “the boring option”.
I think the boring option has some a priori superiority, but loses on net provided you’re right about the version of the game which has a small chance of ending at each round. I think it’s analogous to the following argument about Prisoner’s Dilemma. The argument is between a Strawman Economist and a Steelman Douglas Hofstadter.
ECONOMIST: The normatively correct thing to do in Prisoner’s Dilemma (PD) is to defect.
DOUGLAS: But in iterated PD, players can use the tit-for-tat strategy. If they do, it’s rational for both of them to cooperate, and for both of them to continue using tit-for-tat. And most real PDs can be considered as iterated.
E: Ahh, true, but no game is really infinitely iterated. We know it stops at some point. At that point, there’s no remaining incentive to cooperate. So both players should defect. Knowing this, players should actually think of the tit-for-tat chain as stopping one step earlier than this. But then the second-to-last move also becomes defect. And so on. The tit-for-tat strategy unravels all the way back to the beginning, and we’re back at total defection.
D: Ahh, true, but in practice we’re uncertain about when the game will end! Depending on our uncertainty, this can rescue tit-for-tat. So what we really get is a specific crossover point. If we’re sufficiently certain about when the game will end, you are correct. If we’re sufficiently uncertain, then I’ll be correct instead.
E: Damn, you’re right!
Similarly with straw economist and steel Kelly:
E: The rational way to evaluate bets is by taking the expectation. If a bet is worthwhile at all, it’s worth going all-in, if the other side will accept that large of a bet.
K: Wait, look! In an infinitely iterated game, Bunthut’s generalization of expectation maximization says to use my Kelly Criterion. And betting really is an iterated game. You shouldn’t consider each bet in isolation.
E: Why are the limits ordered that way?
K: As Eigil commented elsewhere, lim(EV) doesn’t equal EV(lim). And in fact the EV of the all-in strategy in the infinitely iterated case is zero. So this ordering of limits is the one that generalizes EV. The other ordering prefers the all-in strategy, even for the infinite game, so it can’t be a valid generalization of EV.
E: OK true, but consider this: I’m only going to make a finite number of bets in my life. Maybe I play the stocks for several decades, but then I retire; the size of my nest egg at retirement is what I care about. Your formula agrees with EV maximization in finite cases, so it must agree that I should use the all-in strategy here.
K: Suuure, but consider this: you don’t generally know when you’ll make your last bet. You probably won’t stop playing the stocks when you retire, and few anticipate the exact day they die. If we incorporate that uncertainty, we get behavior resembling EV maximization when we’re sufficiently certain of the game’s end, but we get behavior resembling Kelly when we’re sufficiently uncertain.
E: Damn, you’re right!
So my takeaway is: your argument about the 1%-probability-of-ending case is a crux for me. It makes the difference between this being a clever but rarely-applicable analysis of an infinite game, vs a frequently-applicable analysis of games with uncertain end. I’d really like to see how that works out.
I’m also curious whether this can be applied to other problems, like the St. Petersburg Lottery.
but the only really arbitrary seeming port is the choice about how to order the limits.
Deciding that the probabaility of overtaking needs to be >0 might also count here. The likely alternative would be ≥0.5. I’ve picked 0 because I expect that normally this limit will be 0 or 1 or 0.5, and if we get other values then ≥0.5 might lead to intransitive comparisons—but I might rethink this if I discover a case where the limit isn’t one of those three.
The tit-for-tat strategy unravels all the way back to the beginning, and we’re back at total defection.
Is even that true? Yes, defecting the last n rounds dominates defecting the last n-1 rounds, but defecting the last 5 vs tit-for-tat all the way the equilibrium can be either depending on the initial population. So for all I know there might be a stable equilibrium involving only strategies that cooperate until close to the end.
I’m also curious whether this can be applied to other problems, like the St. Petersburg Lottery.
I would think so. We can model the St. Petersburg game as the limit of a sequence of n-Petersburg games just like the original but ending after n turns no matter the cointoss then. I still can’t fully solve this analytically, but I can lower-bound it: Consider a price p for the petersburg game that is higher than the minimum 2$ you get . Then a sample of n people need to earn more than p*n for the game to be worth it. The probability that one bettor earns that much or more (provided the game has enough rounds for that) is (1/2)log2(pn), or equivalently 1/pn. The probability that at least one of the n participants manage to do this is 1−(1−1/pn)n, and as n goes to infinity that comes out as 1−e−1/p, a finite value larger than 0. So that sort of is a case where a probability other than 0, 0.5, or 1 occurs—sort of, because one bettor getting very lucky is only one way the betting team can win. But this tells us that according to my definition above, the petersburg bet is at least (again, because thats only one way) as good an any finite fixed price, which if it isn’t paradoxical means that its more valuable than any fixed price.
We can also modify the petersburg game to triple the payoff after every cointoss instead of double. In that case the probabilities above are (1/2)log3(pn)=(1/pn)log3(2) for the single, and for team 1−(1−1/(pn)log3(2))n=1−(((pn)log3(2)−1)/((pn)log3(2))n, which comes out to 1 (note: link missing the initial 1). So that version is definitely more valuable than any fixed price. I think the original is a bit strange here because 1/x is also different from the other powers.
I have also by now done some monte carlo tests (the distributions here have infinite mean and variance, but I should be allowed to sample with this method—for essentially the same reasons that pulling the comparison into the limit works). These indicate that the original petersburg game is more valuable than 6$ and likely more valuable than 100$ (it definitely wins more than the bound I gave above). I’ve also tested the Kelly game with decay, but with a 10% chance of ending each round so that its easier to calculate. I unfortunately don’t have enough compute to be very detailed about this but I can say that betting 60% is better than betting 90% and better than all-in, which supports my thesis about an increase in recommended bet to less than 100%.
You might also have meant some kind of repeated St. Petersburg bet with fractional betting, but the normal petersburg bet doesn’t have a wagered amount. Perhaps we could try assigning it some price with the option of buying multiples, and then ask what fraction of your money you would want to spend on them each round? Maybe I’ll look into that next.
but the only really arbitrary seeming port is the choice about how to order the limits.
Deciding that the probabaility of overtaking needs to be > 0 might also count here.
Ah, right, good point. I missed that. 0.5 does seem like a more reasonable choice, so that we know the ordering is as fine-grained as possible.
The tit-for-tat strategy unravels all the way back to the beginning, and we’re back at total defection.
Is even that true? Yes, defecting the last n rounds dominates defecting the last n-1 rounds, but defecting the last 5 vs tit-for-tat all the way the equilibrium can be either depending on the initial population. So for all I know there might be a stable equilibrium involving only strategies that cooperate until close to the end.
I’m not sure exactly what setup you’re imagining. If we’re just thinking about Nash equiliblria or correlated equilibria, there’s no “initial population”. If we’re doing something like evolving strategies for the n-round iterated game, then things will still unravel, but it can unravel pretty slowly. A genetic algorithm would have to develop the ability to count rounds, and develop a special exception for defecting in the final round, and then later develop a special exception to defect in the second to last round, etc. So it’s possible that a genetic algorithm would get stuck and effectively never unravel. If so, though, that’s due to limitations in its search.
(And of course this is very dependent on initial population; if it starts out with a lot of defection, it might never develop tit-for-tat in the first place.)
But this tells us that according to my definition above, the petersburg bet is at least (again, because thats only one way) as good an any finite fixed price, which if it isn’t paradoxical means that its more valuable than any fixed price.
Ah, I see, so this approach differs a lot from Ole Peters’ (which suggests logarithmic utility for St Petersburg, just like for Kelly). He studies iterated St Petersburg, though (but w/o fractional betting—just an option to participate in the lotto, at the set price, or not).
OTOH, if we use a cutoff of 1⁄2 rather than 0, the story might be different; there might be a finite price after which it’s not worth it. Which would be interesting. But probably not, I think.
Defecting one round earlier dominates pure tit-for-tat, but defecting five rounds earlier doesn’t dominate pure tit-for-tat. Pure tit-for-tat is better against pure tit-for-tat. So there might be a nash equilibrium containing only strategies that play tit-for-tat until the last few rounds.
Ah, I see, so this approach differs a lot from Ole Peters’
I looked at his paper on the petersburg paradox and I think he gets the correct result for the iterated game. He doesn’t do fractional betting, but he has a variable for players wealth—implicitly, price/wealth is a betting fraction (and since payoffs are fixed, price is implicitly offered odds). Also, and this is quite confusing, even though in the beginning it sounds like he wants to repeat the game with the price fixed but wealth changing over time, his actual calculation assumes the wealth (or distribution over growth rates) is the same each time. He talks about this at the bottom of page 11 and argues that its fine because of commutativity. I’m not sure if that commutativity argument works out, but it means the part before is effectively calculating the growth rate of a betting fraction. And if theres no death in the game, then the highest growth rate does indeed optimize my criterion.
Conceptually though there are differences: Peters totally rejects ensemble averaging. This works in infinite games with no chance of death, because then one player will with certainty experience events at frequencies reflecting the true odds—so it works in ordinary kelly, and it works in this petersburg-bet-in-a-kelly, but it wouldn’t work on the versions with ending chance.
(Also what I said about buying multiples in the last comment was confused—that would be different from one bigger bet.)
OTOH, if we use a cutoff of 1⁄2 rather than 0, the story might be different; there might be a finite price after which it’s not worth it. Which would be interesting. But probably not, I think.
Probably not, no. And provably not for the triple payoff version, so it wouldnt avoid the paradox anyway.
Defecting one round earlier dominates pure tit-for-tat, but defecting five rounds earlier doesn’t dominate pure tit-for-tat. Pure tit-for-tat is better against pure tit-for-tat. So there might be a nash equilibrium containing only strategies that play tit-for-tat until the last few rounds.
Defecting in the last x rounds is dominated by defecting in the last x+1, so there is no pure-strategy equilibrium which involves cooperating in any rounds. But perhaps you mean there could be a mixed strategy equilibrium which involves switching to defection some time near the end, with some randomization.
Clearly such a strategy must involve defecting in the final round, since there is no incentive to cooperate.
But then, similarly, it must involve defecting on the second-to-last round, etc.
So it should not have any probability of cooperating—at least, not in the game-states which have positive probability.
Right? I think my argument is pretty clear if we assume subgame-perfect equilibria (and so can apply backwards induction). Otherwise, it’s a bit fuzzy, but it still seems to me like the equilibrium can’t have a positive probability of cooperating on any turn, even if players would hypothetically play tit-for-tat according to their strategies.
(For example, one equilibrium is for players to play tit-for-tat, but with both players’ first moves being to defect.)
Right, this seems really good! (It deserves more upvotes than it has; it’s suitably mind blowing ;p)
There are some arbitrary choices (this is a generalization of expectation maximization, not an argument from expectation maximization, so it’s not surprising that it’s not a unique solution), but the only really arbitrary seeming port is the choice about how to order the limits. And as discussed in the other comment thread here, Eigil’s comment about lim(EV) vs EV(lim) makes your choice of ordering seem like the more appropriate one—your version matches up with the fact that EV of the all-in strategy for the infinite game is zero, while allowing us to evaluate strategies for the cases where EV of the infinite game is not well-defined.
The alternate intuition is that, since EV of infinite games is problematic, we should just compare the EV of strategies on very large numbers of iterations. This is basically your alternate limit ordering, and Eigil’s lim(EV) as opposed to EV(lim). And “the boring option”.
I think the boring option has some a priori superiority, but loses on net provided you’re right about the version of the game which has a small chance of ending at each round. I think it’s analogous to the following argument about Prisoner’s Dilemma. The argument is between a Strawman Economist and a Steelman Douglas Hofstadter.
ECONOMIST: The normatively correct thing to do in Prisoner’s Dilemma (PD) is to defect.
DOUGLAS: But in iterated PD, players can use the tit-for-tat strategy. If they do, it’s rational for both of them to cooperate, and for both of them to continue using tit-for-tat. And most real PDs can be considered as iterated.
E: Ahh, true, but no game is really infinitely iterated. We know it stops at some point. At that point, there’s no remaining incentive to cooperate. So both players should defect. Knowing this, players should actually think of the tit-for-tat chain as stopping one step earlier than this. But then the second-to-last move also becomes defect. And so on. The tit-for-tat strategy unravels all the way back to the beginning, and we’re back at total defection.
D: Ahh, true, but in practice we’re uncertain about when the game will end! Depending on our uncertainty, this can rescue tit-for-tat. So what we really get is a specific crossover point. If we’re sufficiently certain about when the game will end, you are correct. If we’re sufficiently uncertain, then I’ll be correct instead.
E: Damn, you’re right!
Similarly with straw economist and steel Kelly:
E: The rational way to evaluate bets is by taking the expectation. If a bet is worthwhile at all, it’s worth going all-in, if the other side will accept that large of a bet.
K: Wait, look! In an infinitely iterated game, Bunthut’s generalization of expectation maximization says to use my Kelly Criterion. And betting really is an iterated game. You shouldn’t consider each bet in isolation.
E: Why are the limits ordered that way?
K: As Eigil commented elsewhere, lim(EV) doesn’t equal EV(lim). And in fact the EV of the all-in strategy in the infinitely iterated case is zero. So this ordering of limits is the one that generalizes EV. The other ordering prefers the all-in strategy, even for the infinite game, so it can’t be a valid generalization of EV.
E: OK true, but consider this: I’m only going to make a finite number of bets in my life. Maybe I play the stocks for several decades, but then I retire; the size of my nest egg at retirement is what I care about. Your formula agrees with EV maximization in finite cases, so it must agree that I should use the all-in strategy here.
K: Suuure, but consider this: you don’t generally know when you’ll make your last bet. You probably won’t stop playing the stocks when you retire, and few anticipate the exact day they die. If we incorporate that uncertainty, we get behavior resembling EV maximization when we’re sufficiently certain of the game’s end, but we get behavior resembling Kelly when we’re sufficiently uncertain.
E: Damn, you’re right!
So my takeaway is: your argument about the 1%-probability-of-ending case is a crux for me. It makes the difference between this being a clever but rarely-applicable analysis of an infinite game, vs a frequently-applicable analysis of games with uncertain end. I’d really like to see how that works out.
I’m also curious whether this can be applied to other problems, like the St. Petersburg Lottery.
Deciding that the probabaility of overtaking needs to be >0 might also count here. The likely alternative would be ≥0.5. I’ve picked 0 because I expect that normally this limit will be 0 or 1 or 0.5, and if we get other values then ≥0.5 might lead to intransitive comparisons—but I might rethink this if I discover a case where the limit isn’t one of those three.
Is even that true? Yes, defecting the last n rounds dominates defecting the last n-1 rounds, but defecting the last 5 vs tit-for-tat all the way the equilibrium can be either depending on the initial population. So for all I know there might be a stable equilibrium involving only strategies that cooperate until close to the end.
I would think so. We can model the St. Petersburg game as the limit of a sequence of n-Petersburg games just like the original but ending after n turns no matter the cointoss then. I still can’t fully solve this analytically, but I can lower-bound it: Consider a price p for the petersburg game that is higher than the minimum 2$ you get . Then a sample of n people need to earn more than p*n for the game to be worth it. The probability that one bettor earns that much or more (provided the game has enough rounds for that) is (1/2)log2(pn), or equivalently 1/pn. The probability that at least one of the n participants manage to do this is 1−(1−1/pn)n, and as n goes to infinity that comes out as 1−e−1/p, a finite value larger than 0. So that sort of is a case where a probability other than 0, 0.5, or 1 occurs—sort of, because one bettor getting very lucky is only one way the betting team can win. But this tells us that according to my definition above, the petersburg bet is at least (again, because thats only one way) as good an any finite fixed price, which if it isn’t paradoxical means that its more valuable than any fixed price.
We can also modify the petersburg game to triple the payoff after every cointoss instead of double. In that case the probabilities above are (1/2)log3(pn)=(1/pn)log3(2) for the single, and for team 1−(1−1/(pn)log3(2))n=1−(((pn)log3(2)−1)/((pn)log3(2))n, which comes out to 1 (note: link missing the initial 1). So that version is definitely more valuable than any fixed price. I think the original is a bit strange here because 1/x is also different from the other powers.
I have also by now done some monte carlo tests (the distributions here have infinite mean and variance, but I should be allowed to sample with this method—for essentially the same reasons that pulling the comparison into the limit works). These indicate that the original petersburg game is more valuable than 6$ and likely more valuable than 100$ (it definitely wins more than the bound I gave above). I’ve also tested the Kelly game with decay, but with a 10% chance of ending each round so that its easier to calculate. I unfortunately don’t have enough compute to be very detailed about this but I can say that betting 60% is better than betting 90% and better than all-in, which supports my thesis about an increase in recommended bet to less than 100%.
You might also have meant some kind of repeated St. Petersburg bet with fractional betting, but the normal petersburg bet doesn’t have a wagered amount. Perhaps we could try assigning it some price with the option of buying multiples, and then ask what fraction of your money you would want to spend on them each round? Maybe I’ll look into that next.
Ah, right, good point. I missed that. 0.5 does seem like a more reasonable choice, so that we know the ordering is as fine-grained as possible.
I’m not sure exactly what setup you’re imagining. If we’re just thinking about Nash equiliblria or correlated equilibria, there’s no “initial population”. If we’re doing something like evolving strategies for the n-round iterated game, then things will still unravel, but it can unravel pretty slowly. A genetic algorithm would have to develop the ability to count rounds, and develop a special exception for defecting in the final round, and then later develop a special exception to defect in the second to last round, etc. So it’s possible that a genetic algorithm would get stuck and effectively never unravel. If so, though, that’s due to limitations in its search.
(And of course this is very dependent on initial population; if it starts out with a lot of defection, it might never develop tit-for-tat in the first place.)
Ah, I see, so this approach differs a lot from Ole Peters’ (which suggests logarithmic utility for St Petersburg, just like for Kelly). He studies iterated St Petersburg, though (but w/o fractional betting—just an option to participate in the lotto, at the set price, or not).
OTOH, if we use a cutoff of 1⁄2 rather than 0, the story might be different; there might be a finite price after which it’s not worth it. Which would be interesting. But probably not, I think.
Defecting one round earlier dominates pure tit-for-tat, but defecting five rounds earlier doesn’t dominate pure tit-for-tat. Pure tit-for-tat is better against pure tit-for-tat. So there might be a nash equilibrium containing only strategies that play tit-for-tat until the last few rounds.
I looked at his paper on the petersburg paradox and I think he gets the correct result for the iterated game. He doesn’t do fractional betting, but he has a variable for players wealth—implicitly, price/wealth is a betting fraction (and since payoffs are fixed, price is implicitly offered odds). Also, and this is quite confusing, even though in the beginning it sounds like he wants to repeat the game with the price fixed but wealth changing over time, his actual calculation assumes the wealth (or distribution over growth rates) is the same each time. He talks about this at the bottom of page 11 and argues that its fine because of commutativity. I’m not sure if that commutativity argument works out, but it means the part before is effectively calculating the growth rate of a betting fraction. And if theres no death in the game, then the highest growth rate does indeed optimize my criterion.
Conceptually though there are differences: Peters totally rejects ensemble averaging. This works in infinite games with no chance of death, because then one player will with certainty experience events at frequencies reflecting the true odds—so it works in ordinary kelly, and it works in this petersburg-bet-in-a-kelly, but it wouldn’t work on the versions with ending chance.
(Also what I said about buying multiples in the last comment was confused—that would be different from one bigger bet.)
Probably not, no. And provably not for the triple payoff version, so it wouldnt avoid the paradox anyway.
Defecting in the last x rounds is dominated by defecting in the last x+1, so there is no pure-strategy equilibrium which involves cooperating in any rounds. But perhaps you mean there could be a mixed strategy equilibrium which involves switching to defection some time near the end, with some randomization.
Clearly such a strategy must involve defecting in the final round, since there is no incentive to cooperate.
But then, similarly, it must involve defecting on the second-to-last round, etc.
So it should not have any probability of cooperating—at least, not in the game-states which have positive probability.
Right? I think my argument is pretty clear if we assume subgame-perfect equilibria (and so can apply backwards induction). Otherwise, it’s a bit fuzzy, but it still seems to me like the equilibrium can’t have a positive probability of cooperating on any turn, even if players would hypothetically play tit-for-tat according to their strategies.
(For example, one equilibrium is for players to play tit-for-tat, but with both players’ first moves being to defect.)
Yeah you’re right. I just realized that what I had in mind originally already implicitly had superationality.