Simplified Poker Strategy

Pre­vi­ously in Se­quence (Re­quired): Sim­plified Poker

I spent a few hours figur­ing out my strat­egy. This is what I sub­mit­ted.

If you start with a 2, you never want to bet, since your op­po­nent will call with a 3 but fold with a 1. So we can as­sume no one who bets ever has a 2. But you might want to call a bet.

If you start with a 1, you never call a bet, but some­times want to bet as a bluff.

If you start with a 3 in first po­si­tion, some­times you may want to check to al­low your op­po­nent to bet with a 1. If you have a 3 in sec­ond po­si­tion, you have no de­ci­sions.

Thus, a non-dom­i­nated strat­egy can be rep­re­sented by five prob­a­bil­ities: The chance you bet with a 1 in first po­si­tion, chance you bet with a 3 in first po­si­tion, chance you bet with a 1 in sec­ond po­si­tion, chance you call with a 2 in first po­si­tion, and chance you call with a 2 in sec­ond po­si­tion. Call a set of these five num­bers a strat­egy.

There were likely to be a few play­ers bad enough to bet with a 2 or per­haps make the other mis­takes, but I chose for com­plex­ity rea­sons not to worry about that, as­sum­ing I’d still do some­thing close to op­ti­mal. If I was con­fi­dent com­plex­ity was free, I’d have in­cluded a check to see if we ever caught the op­po­nent do­ing some­thing crazy, and ad­just ac­cord­ingly.

If you know the op­pos­ing strat­egy, what to do is ob­vi­ous. Thus, I defined a func­tion called ‘best re­sponse’ that takes a strat­egy, and out­puts the strat­egy that max­i­mizes against that strat­egy.

My goal was to de­rive the op­po­nents’ strat­egy, then play the best re­sponse to that strat­egy.

As a safe­guard against op­po­nents who were an­ti­ci­pat­ing such a strat­egy, I in­cluded an es­cape hatch: If at any point, my op­po­nent got ahead by 10 or more chips, as­sume they were a level ahead of me, and play­ing the best re­sponse to what I would oth­er­wise do. So de­rive what that is, and play the best re­sponse to that!

That skipped over the key puz­zle, which is figur­ing out what the op­po­nent is do­ing. On the first turn, I guessed op­po­nents would pur­sue rea­son­able mixed strate­gies: bet a 1 about a third of the time, bet a 3 in first po­si­tion about two thirds of the time, call with a 2 about half the time. I rep­re­sented this with a vir­tual hand his­tory that I in­cluded un­til I had enough real ones.

On sub­se­quent turns, I looked at the hand his­tory.

If the op­po­nents’ card was re­vealed, that was a pure data point – if we knew they bet with a 1, that’s a hand where they did that.

If the op­po­nents’ card wasn’t re­vealed, but only one card made any sense, I as­sumed they had that card. Thus, if I bet with a 1 and they fold, I as­sume they had a 2.

If the op­po­nents’ card wasn’t re­vealed, and they could have had ei­ther card be­cause you bet a 3 and they folded, or they bet and you folded a 2, that’s trick­ier. The prob­a­bil­ity of them hav­ing each card in that spot de­pends on their strat­egy. And again, there was a (un­known soft) com­plex­ity limit.

My solu­tion was to as­sume that in each unique start­ing po­si­tion (your po­si­tion plus your card) half the time my op­po­nent would draw the higher of the two cards I hadn’t drawn, and half the time he’d draw the lower one. So half the time I have a 2 in first po­si­tion, he has a 3, half the time he has a 1.

That was definitely not ideal, and I don’t re­mem­ber ex­actly how I did it, but it definitely did the thing it was de­signed to do: Iden­tify ex­ploitable agents light­ning fast, and do some­thing rea­son­able against rea­son­able ones. Try­ing to op­ti­mize the de­tails of this type of ap­proach is an in­ter­est­ing puz­zle, both with and with­out a com­plex­ity limi­ta­tion.

No comments.