Sim­pli­fied Poker Strategy

Pre­vi­ously in Sequence (Re­quired): Sim­pli­fied Poker

I spent a few hours fig­ur­ing out my strategy. This is what I sub­mit­ted.

If you start with a 2, you never want to bet, since your op­pon­ent will call with a 3 but fold with a 1. So we can as­sume no one who bets ever has a 2. But you might want to call a bet.

If you start with a 1, you never call a bet, but some­times want to bet as a bluff.

If you start with a 3 in first po­s­i­tion, some­times you may want to check to al­low your op­pon­ent to bet with a 1. If you have a 3 in second po­s­i­tion, you have no de­cisions.

Thus, a non-dom­in­ated strategy can be rep­res­en­ted by five prob­ab­il­it­ies: The chance you bet with a 1 in first po­s­i­tion, chance you bet with a 3 in first po­s­i­tion, chance you bet with a 1 in second po­s­i­tion, chance you call with a 2 in first po­s­i­tion, and chance you call with a 2 in second po­s­i­tion. Call a set of these five num­bers a strategy.

There were likely to be a few play­ers bad enough to bet with a 2 or per­haps make the other mis­takes, but I chose for com­plex­ity reas­ons not to worry about that, as­sum­ing I’d still do some­thing close to op­timal. If I was con­fid­ent com­plex­ity was free, I’d have in­cluded a check to see if we ever caught the op­pon­ent do­ing some­thing crazy, and ad­just ac­cord­ingly.

If you know the op­pos­ing strategy, what to do is ob­vi­ous. Thus, I defined a func­tion called ‘best re­sponse’ that takes a strategy, and out­puts the strategy that max­im­izes against that strategy.

My goal was to de­rive the op­pon­ents’ strategy, then play the best re­sponse to that strategy.

As a safe­guard against op­pon­ents who were an­ti­cip­at­ing such a strategy, I in­cluded an es­cape hatch: If at any point, my op­pon­ent got ahead by 10 or more chips, as­sume they were a level ahead of me, and play­ing the best re­sponse to what I would oth­er­wise do. So de­rive what that is, and play the best re­sponse to that!

That skipped over the key puzzle, which is fig­ur­ing out what the op­pon­ent is do­ing. On the first turn, I guessed op­pon­ents would pur­sue reas­on­able mixed strategies: bet a 1 about a third of the time, bet a 3 in first po­s­i­tion about two thirds of the time, call with a 2 about half the time. I rep­res­en­ted this with a vir­tual hand his­tory that I in­cluded un­til I had enough real ones.

On sub­sequent turns, I looked at the hand his­tory.

If the op­pon­ents’ card was re­vealed, that was a pure data point – if we knew they bet with a 1, that’s a hand where they did that.

If the op­pon­ents’ card wasn’t re­vealed, but only one card made any sense, I as­sumed they had that card. Thus, if I bet with a 1 and they fold, I as­sume they had a 2.

If the op­pon­ents’ card wasn’t re­vealed, and they could have had either card be­cause you bet a 3 and they fol­ded, or they bet and you fol­ded a 2, that’s trick­ier. The prob­ab­il­ity of them hav­ing each card in that spot de­pends on their strategy. And again, there was a (un­known soft) com­plex­ity limit.

My solu­tion was to as­sume that in each unique start­ing po­s­i­tion (your po­s­i­tion plus your card) half the time my op­pon­ent would draw the higher of the two cards I hadn’t drawn, and half the time he’d draw the lower one. So half the time I have a 2 in first po­s­i­tion, he has a 3, half the time he has a 1.

That was def­in­itely not ideal, and I don’t re­mem­ber ex­actly how I did it, but it def­in­itely did the thing it was de­signed to do: Identify ex­ploit­able agents light­ning fast, and do some­thing reas­on­able against reas­on­able ones. Try­ing to op­tim­ize the de­tails of this type of ap­proach is an in­ter­est­ing puzzle, both with and without a com­plex­ity lim­it­a­tion.