# Simplified Poker

This is in­tended as a three-part se­quence. Part two will go over my strat­egy. Part three will re­veal the re­sults and dis­cuss some im­pli­ca­tions.

In the same class in which we later played The Dar­win Game, we played a less com­plex game called Sim­plified Poker. As in The Dar­win Game, we were given the rules and asked to sub­mit in­struc­tions for a com­puter pro­gram that would play the game, and the pro­fes­sor would then code our pro­grams for us.

The rules of Sim­plified Poker are as fol­lows:

Game is played with a 3-card deck, with the cards la­beled 1, 2 and 3.

Each hand, the play­ers al­ter­nate who goes first, each player antes one chip and is dealt one card.

The first player can bet one chip, or check.

If the first player bets, the sec­ond player can ei­ther call the one chip bet, or fold.

If the first player checks, the sec­ond player can ei­ther also check, or can bet. If the sec­ond player bets, the first player can ei­ther call the one chip bet, or fold.

There is at most one bet per hand, as nei­ther player is al­lowed to raise.

If ei­ther player folds, the other wins the pot of 2 chips and takes back their 1 chip bet. Nei­ther card is shown. If nei­ther player folds – ei­ther both play­ers check, or there is a bet and a call – then both cards are re­vealed and the player with the higher card takes all 4 chips.

In the class, all pro­grams would play a round robin with all other pro­grams, with 50 hands per match. Your goal is to max­i­mize the av­er­age num­ber of chips won over all rounds – note that how many op­po­nents you beat does not mat­ter, only the num­ber of chips won.

The game is sim­ple. A lot, but far from all, of your de­ci­sions are forced. There’s no weird trick, but op­ti­mal play still isn’t ob­vi­ous. I’ll pause here to al­low and en­courage think­ing about what strat­egy you’d sub­mit.

• If nei­ther player folds – ei­ther both play­ers check, or there is a bet and a call – then both cards are re­vealed and the player with the higher card takes all 4 chips.

Pre­sum­ably you mean to say that if both play­ers check, the player with the higher card takes the 2 chip pot, not 4 chips?

Any­way here is the unique (up to choice of x) Nash equil­ibrium, if my calcu­la­tions are cor­rect:

1 tbrf svefg: org jvgu ce­bonovyvgl k, ar­ire pnyy

2 tbrf svefg: ar­ire org, pnyy jvgu ce­bonovyvgl k+1/​3

3 tbrf svefg: org jvgu ce­bonovyvgl 3k, nyjnlf pnyy

1 tbrf frp­baq: org jvgu ce­bonovyvgl 13, ar­ire pnyy

2 tbrf frp­baq: ar­ire org, pnyy jvgu ce­bonovyvgl 13

3 tbrf frp­baq: nyjnlf org, nyjnlf pnyy

• It’s worth not­ing that against the ac­tual field I faced, you can do much bet­ter than Nash.

• Heh, I tried to game-the­ory it all out and got non­sen­si­cal an­swers (my set of equil­ibria wasn’t con­vex!), and I wasn’t about to redo it, so I’m glad some­one man­aged to de­ter­mine [what I’m pre­sum­ing is] the cor­rect Nash equlibirum...

It is a lit­tle sur­pris­ing to me that nyjnlf pur­px­vat jura fgneg­vat jvgu n 2 vf gur bayl sbeprq pub­vpr jvgubhg na boivbhf ern­fba sbe vg. Gur bgure 6 sbeprq pub­vprf ner rnfl gb qr­grez­var jvgubhg uni­vat gb qb zhpu zngu, ohg abg gung bar. Naq gur bgure 5 pub­vprf gung qb erd­hver qb­vat zngu gb qr­grez­var qba’g raq hc nf 1f be 0f.

But yeah, as Zvi says, I guess part of the whole ques­tion is how far into don­keyspace you want to go. i would’ve stuck with Nash, if I could figure it out, but...

• Jryy, bapr lbh xabj gung frp­baq-1 ar­ire pnyyf naq frp­baq-3 nyjnlf orgf naq pnyyf, lbh xabj gung svefg-2: purpx naq pnyy fgevp­gyl qbz­van­grf svefg-2: org (fvapr vg qbrf gur fnzr nt­nvafg 3 naq fyv­tu­gyl orggre nt­nvafg 1).

• Ohg vs nf svefg-2 nt­nvafg frp­baq-1 lbh purpx, frp­baq-1 zv­tug org naq (fvapr lbh qba’g xabj vg’f 1 en­gure guna 3) lbh zv­tug sbyq va erfcbafr (-1 sbe lbh), zr­na­vat lbh raq hc qb­vat jbefr guna lbh jb­hyq unir unq lbh org (+1 sbe lbh). Fb V qba’g frr ubj vg qbz­van­grf?

• Gu­vax bs purpx-naq-pnyy nf n cevzvgvir npgvba, gura lbh xabj lbh jvyy abg sbyq. Vg’f gehr gung vg’f abg boivbhf gung purpx-naq-sbyq qbz­van­grf org.

• Ah, makes sense now, thanks!

• Cu­rated, both for pro­vid­ing a clear ex­er­cise (this is some­thing the mod team would like to see a lot more of), as well as the for the fol­lowup posts that dis­cuss that ex­er­cise and some of the im­por­tant, broader ram­ifi­ca­tions.

• It seems to me like you have four de­ci­sions (bet/​check if first, call/​fold if sec­ond, bet/​check if sec­ond, and call/​fold if first) in three states of knowl­edge, mean­ing twelve pos­si­ble pure agents—with the challenge that the agent might be shift­ing over time, as not only can they play a mixed strat­egy of pure agents, but also they can change that mix ac­cord­ing to what hap­pens in the game.

It’s not ob­vi­ous to me that, with only fifty rounds, you have enough time to iden­tify ex­ploitable agents and ex­ploit them (es­pe­cially since it looks like mixed strate­gies are al­lowed). Which then leads to the ques­tion of whether you should just sub­mit Nash (as Da­cyn lays out), some­thing that un­der­performs when play­ing Nash but over­performs when play­ing naive agents (note that there’s no pop­u­la­tion change, so the im­por­tant dis­tri­bu­tion is the sub­mit­ted dis­tri­bu­tion), or some­thing that tries to adapt.

So what does a naive bot look like? If it’s look­ing at a 1, it’ll check or fold. If it’s look­ing at a 3, it’ll bet or call. If it’s look­ing at a two, it’ll prob­a­bly flip a coin to play ag­gres­sively (bet­ting /​ call­ing) or defen­sively (check­ing /​ fold­ing) or just play defen­sively.

The difficulty of iden­ti­fy­ing the op­po­nent’s strat­egy sug­gests to me a two-phase plan, as op­posed to a fully re­ac­tive one. Start off play­ing overly ag­gres­sively, track how many chips you’re up or down, and then switch to play­ing Nash if down a par­tic­u­lar amount. (Per­haps it just always bets and calls? Without think­ing of more naive agents, it’s not clear what’ll ex­ploit them best.)

• >It’s not ob­vi­ous to me that, with only fifty rounds, you have enough time to iden­tify ex­ploitable agents and ex­ploit them

That was the first line of think­ing I had too.

If so, you might want to check-open and call down with your 2′s early in the game to see if you’re go­ing to only be pay­ing 3′s, or what fre­quency they bluff 1′s.

Also very im­por­tant to ob­serve whether they call-down with 2′s or toss them in ei­ther po­si­tion, and with what fre­quency.

On a quick think, the in­abil­ity to raise means that bluffing (1′s) and semi-bluffing (2′s) is prob­a­bly more vi­able… if the op­po­nent wasn’t run­ning any based-on-op­po­si­tion-ad­just­ment, I won­der to what ex­tent “bet ev­ery hand from open­ing po­si­tion” might play out. It’s cer­tainly not op­ti­mal but might do bet­ter than ex­pected. Pre­sum­ably sec­ond-po­si­tion always folds 1′s and calls 3′s, so you’d wind up with —

ALWAYS BET OPENING HANDS

If each hand mix is around 1/​9th of the dis­tri­bu­tion of hands (it doesn’t say how large the deck is)...

1v1: You steal a lit­tle more than a half chip than ex­pected, in­stead of push or fold to bluff. Op­po­nent never calls.

1v2: You steal one chip… 50% of the time? 70%? And lose an ex­tra chip the re­main­der of the time.

1v3: You lose two chips 100% of the time, in­stead of los­ing one chip as ex­pected.

2v1: You get one chip with min­i­mal gained equity (small value in you don’t have to face a bluff).

2v2: You steal one chip… 50% of the time? 70%? And you push the re­main­der of the time.

2v3: You lose two chips 100% of the time, in­stead of los­ing one chip as ex­pected.

3v1: They fold, gain one chip as ex­pected.

3v2: They fold… 50% of the time? 70%? You get a sec­ond chip the re­main­der of the time.

3v3: They call, it’s a push.

So you’d wind up with...

1v1: +0.5 ex­pected over equity.

1v2: +equity if they fold at 67% or higher, -equity if they fold 66% or less.

1v3: −1 ex­pected over equity.

2v1: Min­i­mally small +equity for not hav­ing to make de­ci­sion against bluff, but not much gained.

2v2: Guaran­teed +equity if they ever fold, with their fold rate of 2′s be­ing the equity gained rate.

2v3: Slightly less than −1 ex­pected over equity (since you’d check-call some­times to pro­tect).

3v1: They fold, as ex­pected. Breakeven.

3v2: +equity on what­ever their call­ing rate is.

3v3: Push, as ex­pected. Breakeven

As­sum­ing the deck is large enough that these are all similarly weighted matchups (EX 1v1 or 2v2 isn’t much less likely), then you’d get +0.5, ???, −1, +tiny, +medium, −1, 0, +medium, 0 [as com­pared to ex­pected nor­mal equity].

I think that comes out ahead — I’d spec­u­late that among peo­ple who don’t sit and run anal­y­sis, that they fold 2′s to open bets slightly too of­ten. Or, hmm, maybe not. You could do bet­ter than this of course… but the fact that I’m even run­ning the anal­y­sis to see if “liter­ally bet ev­ery­thing from open­ing po­si­tion” is good shows just how much the in­abil­ity to raise changes the game.

• FWIW, you aren’t play­ing 50 rounds of one game. You’re re­ally play­ing 25 rounds each of 2 games, so you have half as many pa­ram­e­ters to es­ti­mate in half as many ob­ser­va­tions.

• Can my pro­gram use his­tory of hands it player against the cur­rent op­po­nent? What about his­tory of hands the op­po­nent player against oth­ers?

• On this topic, I recom­mend _The Math­e­mat­ics of Poker_ by Anken­man and Chen. They fully solve a num­ber of sim­plified poker games like this one.

• 8 months late. I’m com­ing into this cold but hav­ing pre­vi­ously read about a very similar com­pe­ti­tion to cre­ate strate­gies to play Rock-Paper-Scis­sors (RPS). First, work out all the de­ci­sion points in the game, and the pos­si­ble in­for­ma­tion available at each de­ci­sion point. We end up with 2 bi­nary de­ci­sions for each player, and 3 states of in­for­ma­tion at each de­ci­sion point.

So my first strat­egy is to pre­dict my op­po­nent’s de­ci­sions, and calcu­late which of my pos­si­ble de­ci­sions will give me the best re­sult. For RPS this is pretty sim­ple:

P(R), P(P), P(S): prob­a­bil­ities my op­po­nent will play Rock, Paper, and Scis­sors.

V(R), V(P), V(S): ex­pected score (value) for me play­ing Rock, Paper, Scis­sors.

V(R) = (P(R) * 0 + P(P) * −1 + P(S) * 1) /​ (P(R) + P(P) + P(S))

The calcu­la­tion on the line above is for the gen­eral case. For the spe­cific case of RPS, it sim­plifies to:

V(R) = P(S) - P(P)

V(P) = P(R) - P(S)

V(S) = P(P) - P(R)

A sur­pris­ing num­ber of com­peti­tors fail to play op­ti­mally against their op­po­nent’s pre­dicted ac­tions. For ex­am­ple, with P(R) = 0.45, P(P) = 0.16, P(S) = 0.39, many com­peti­tors play Paper, even though the best ex­pected value is from play­ing Rock. (Op­ti­mal play ex­ploits un­usu­ally low prob­a­bil­ities as well as un­usu­ally high prob­a­bil­ities.)

In RPS there are three pos­si­ble de­ci­sions, but in sim­plified poker all the de­ci­sion points are bi­nary, so we can use A and !A to rep­re­sent both prob­a­bil­ities, in­stead of A, B, and C. I choose to rep­re­sent bet­ting and call­ing as di­rect prob­a­bil­ities, and check­ing and fold­ing as the com­ple­men­tary prob­a­bil­ities.

A, B, C: player #1 bets with a 1, 2, or 3 respectively

D, E, F: af­ter a check and a bet, player #1 calls with a 1, 2, 3

G, H, I: af­ter a bet, player #2 calls with a 1, 2, 3

J, K, L: af­ter a check, player #2 bets with a 1, 2, 3

The ex­pected value calcu­la­tions are more com­pli­cated than in RPS (among other things, you can be un­cer­tain about the cur­rent state of the game be­cause you don’t know which card your op­po­nent has, and the out­come of player #1′s game some­times de­pends on its own fu­ture de­ci­sions), but thanks to the bi­nary de­ci­sions the re­sults can be sim­plified al­most as much as in RPS.

D(A), D(B), etc.: con­di­tion nec­es­sary to de­cide to do A, B, etc. Calcu­late V(A) and V(!A), then D(A) = V(A) > V(!A) and D(!A) = V(A) < V(!A). If they’re equal, then you play your pre­de­ter­mined Nash equil­ibrium strat­egy.

Player #1:

D(A) = 43 > P(H) + P(I)

D(B) = 2 + P(G) + z > 3 * P(I), where z = P(L) - P(J) when 3 * P(J) > P(L) and z = 2 * P(J) when 3 * P(J) < P(L)

D(C) = P(G) + P(H) > P(J) + P(K)

D(D) = false

D(E) = 3 * P(J) > P(L)

D(F) = true

Player #2:

D(G) = false

D(H) = 3 * P(A) > P(C)

D(I) = true

D(J) = P(!B) * (2 * P(!E) - P(E)) > P(!C) * (P(F) − 2 * P(!F))

D(K) = P(!A) * P(D) > P(!C) * (3 * P(F) + 2)

D(L) = P(!A) * P(D) + P(!B) * P(E) > 0

Trans­lated back to English:

#1 with a 1: If you pre­dict #2 will fold of­ten enough, then bet (bluff), oth­er­wise check, and always fold if #2 bets.

#1 with a 2: Bet only if you pre­dict #2 will call with a 1 and fold with a 3 enough more than bluffing with a 1 and check­ing with a 3. Call af­ter #2 bets if there’s a high enough chance it’s a bluff.

#1 with a 3: Bet or call de­pend­ing on whether #2 is more likely to call your bet or bet af­ter you check. Always call if #2 bets.

#2 with a 1: If #1 bets, fold. If #1 checks and will fold of­ten enough, then bluff, oth­er­wise check.

#2 with a 2: If #1 bets, call if the chances of a bluff are high enough, oth­er­wise fold. If #2 checks, check un­less you pre­dict #1 will call with a 1 and fold with a 3 of­ten enough com­bined to be worth it.

#2 with a 3: If #1 bets, call. If #1 checks, bet.

Alert read­ers will com­plain that I’ve skipped over the most in­ter­est­ing step: pre­dict­ing what my op­po­nent will play. This is true, but the above steps needed to be done first, be­cause many of the in­ter­est­ing strate­gies for pre­dict­ing your op­po­nent’s play as­sume they’ve done the same anal­y­sis. If both play­ers play fol­low­ing this strat­egy, and both know that the other will play fol­low­ing this strat­egy, then play set­tles into one of the Nash equil­ibriums. But, many play­ers won’t play op­ti­mally, and if you can iden­tify de­vi­a­tions from the Nash equil­ibrium quickly then you can get a bet­ter score. If your op­po­nent is do­ing the same thing, then you can fake a de­vi­a­tion from Nash that low­ers your score a lit­tle, but causes your op­po­nent to de­vi­ate from the Nash equil­ibrium in a way that you can ex­ploit for more gain than your loss (un­til your op­po­nent catches on). So I can pre­dict you will pre­dict I will pre­dict you will… and it seems to go into an in­finite loop of ever-higher lev­els of dou­ble-think.

My most im­por­tant take­away from the Rock-Paper-Scis­sors com­pe­ti­tion was that if there are a finite num­ber of de­ter­minis­tic strate­gies, then the num­ber of lev­els of dou­ble-think are finite too. This is much eas­ier to see in RPS. Given a method of pre­dic­tion P:

P0: as­sume your op­po­nent is vuln­er­a­ble to pre­dic­tion by method P, play to beat it.

P1: as­sume your op­po­nent thinks you will use method P0, and plays to beat it. Play to beat that.

P2: as­sume your op­po­nent thinks you will use P1, and plays to beat it. Play to beat that.

But be­cause in RPS there are only 3 pos­si­ble de­ter­minis­tic strate­gies, P3 recom­mends you play the same way as P0!

There’s also a sec­ond stack where you as­sume your op­po­nent is us­ing P to pre­dict you, then as­sum­ing you know that, and so on, which also ends with 3 de­ter­minis­tic strate­gies.

In sim­plified poker, if you pre­dict your op­po­nent is not play­ing a Nash equil­ibrium strat­egy, and re­spond op­ti­mally your­self, then you will re­spond in one of 16 ways. If you as­sume your op­po­nent has guessed your play and will re­spond op­ti­mally, then there are 8 ways for player #1 to re­spond, and only 4 ways for player #2 to re­spond. So, as­sum­ing I haven’t made a mis­take, there are at most 5 lev­els of sec­ond guess­ing, 1 for re­spond­ing to naive play, and at most 4 more for re­spond­ing to op­ti­mal play be­fore ei­ther you or your op­po­nent start re­peat­ing your­selves.

So, for any method of pre­dic­tion which does not in­volve dou­ble-think­ing, you can gen­er­ate all dou­ble-think strate­gies and re­verse dou­ble-think strate­gies. Then you need a meta-strat­egy to de­cide which one to use on the next hand. If you do this suc­cess­fully then you’ll defeat any­one who is vuln­er­a­ble to one of your meth­ods of pre­dic­tion, uses one of your meth­ods of pre­dic­tion, or uses a strat­egy to di­rectly defeat one of your meth­ods of pre­dic­tion.