Simplified Poker Conclusions

Pre­vi­ously: Sim­plified Poker, Sim­plified Poker Strategy

Re­lated (Eliezer Yud­kowsky): Meta Hon­esty: Firm­ing Hon­esty Around Its Edge Cases

About forty peo­ple sub­mit­ted pro­grams that used ran­dom­iza­tion. Sev­eral of those ran­dom pro­grams cor­rectly solved for the Nash equil­ibrium, which did well.

I sub­mit­ted the only de­ter­minis­tic pro­gram.

I won go­ing away.

I broke even against the Nash pro­grams, ut­terly crushed vuln­er­a­ble pro­grams, and lost a non-triv­ial amount to only one pro­gram, a re­sound­ing heads-up defeat handed to me by the only other top-level gamer in the room, fel­low Magic: the Gather­ing semi-pro player Eric Phillips.

Like me, Eric had an es­cape hatch in his pro­gram that re­versed his de­ci­sions (rather than re­treat­ing to Nash) if he was los­ing by enough. Un­like me, his ac­tu­ally got im­ple­mented – the pro­fes­sor de­cided that given how well I was go­ing to do any­way, I’d hit the com­plex­ity limit, so my es­cape hatch was left out.

Rather than get into im­ple­men­ta­tion de­tails, or prov­ing the Nash equil­ibrium, I’ll dis­cuss two things: How few lev­els peo­ple play on, and the mo­ti­vat­ing point: How things are already more dis­tinct and ran­dom than you think they are, and how to take ad­van­tage of that.

Next Level

In the com­ments to the first two posts, most peo­ple fo­cused on find­ing the Nash equil­ibrium. A few peo­ple tried to do some­thing that would bet­ter ex­ploit ob­vi­ously stupid play­ers, but none that tried to dis­cover the op­po­nents’ strat­egy.

The only rea­son not to play an ex­ploitable strat­egy is if you’re wor­ried some­one will ex­ploit it!

Con­sider think­ing as hav­ing lev­els. Level N+1 at­tempts to op­ti­mize against Levels N and be­low, or just Level N.

Level 0 isn’t think­ing or op­ti­miz­ing, so higher lev­els all crush it, mostly.

Level 1 think­ing pick­ing ac­tions that are gener­i­cally pow­er­ful, likely to lead to good out­comes, with­out con­sid­er­ing what op­po­nents might do. Do ‘nat­u­ral’ things.

Level 2 think­ing con­sid­ers what to do against op­po­nents us­ing Level 1 think­ing. You try to counter the ‘nat­u­ral’ ac­tions, and ex­ploit stan­dard be­hav­iors.

Level 3 coun­ters Level 2. You as­sume your op­po­nents are try­ing to ex­ploit ba­sic be­hav­iors, and at­tempt to ex­ploit those try­ing to do this.

Level 4 coun­ters Level 3. You as­sume your op­po­nents are try­ing to ex­ploit ex­ploita­tive be­hav­ior, and act­ing ac­cord­ingly. So you do what’s best against that.

And so on. Be­ing caught one level be­low your op­po­nent is death. Be­ing one level ahead is amaz­ing. Two or more lev­els differ­ent, and strange things hap­pen.

Life is messy. Poli­ti­cal cam­paigns, ma­jor cor­po­ra­tion strate­gic plans, the­aters of war. The big stuff. A lot of Level 0. Level 1 is in­dus­try stan­dard. Level 2 is in­spired, ex­cep­tional. Level 3 is the stuff of leg­end.

In well-defined situ­a­tions where losers are strongly filtered out, such as tour­na­ments, you can get glim­mers of high level be­hav­ior. But mostly, you get it by chang­ing the view of what Level 1 is. The old Level 2 and Level 3 strate­gies be­come the new ‘rules of the game’. The brain chunks them into ba­sic ac­tions. Only then can the cy­cle be­gin again.

Also, ‘get­ting’ some­one with Level 3 think­ing risks giv­ing the game away. What level should one be on next time, then?

Effec­tive Randomization

There is a strong in­stinct that when­ever pre­dictable be­hav­ior can be pun­ished, one must ran­dom­ize one’s be­hav­ior.

That’s true. But only from an­other’s point of view. You can’t be pre­dictable, but that doesn’t mean you need to be ran­dom.

It’s an­other form of illu­sion of trans­parency. If you think about a prob­lem differ­ently than oth­ers, their at­tempts to pre­dict or model you will get it wrong. The only re­quire­ment is that your de­ci­sion pro­cess is com­plex, and doesn’t re­duce to a sim­ple model.

If you also have differ­ent in­for­ma­tion than they do, that’s even bet­ter.

When an­a­lyz­ing the hand his­to­ries, I know what cards I was dealt, and use that to de­duce what cards my op­po­nent likely held, and in turn guess their be­hav­iors. Thus, my op­po­nent likely has no clue ei­ther what pro­cess I’m us­ing, how I im­ple­mented it, or what data I’m feed­ing into it. All of that is effec­tive ran­dom­iza­tion.

If that re­duces to me always bet­ting with a 1, they might catch on even­tu­ally. But since I’m con­stantly re-eval­u­at­ing what they’re do­ing, and re­act­ing ac­cord­ingly, on an im­pos­si­ble-to-pre­dict sched­ule, such catch­ing on might end up back­firing. It’s the same at a hu­man poker table. If you’re good enough at read­ing peo­ple to figure out what I’m think­ing and stay one step ahead, I need to re­treat to Nash, but that’s rare. Mostly, I only need to worry, at most, if my ac­tions are effec­tively do­ing some­thing sim­ple and easy to model.

Play­ing the same ex­act sce­nar­ios, or with the same ex­act peo­ple, or both, for long enough, both in­creases the amount of data available for anal­y­sis, and re­duces the ran­dom­ness be­hind it. Even­tu­ally, such tac­tics stop work­ing. But it takes a while, and the more you care about long his­to­ries in non-ob­vi­ous ways, the longer it will take.

Rather than be ac­tu­ally ran­dom, in­stead one ad­justs when one’s be­hav­ior has suffi­ciently de­vi­ated from what would look ran­dom, such that oth­ers will likely ad­just to ac­count for it. That ad­just­ment, too, need not be ran­dom.

Rush­ing into do­ing things to mix up your play, be­fore oth­ers have any data to work with, only leaves value on the table.

One strong strat­egy when one needs to mix it up is to do what the de­tails fa­vor. Thus, if there’s some­thing you need to oc­ca­sion­ally do, and to­day is an un­usu­ally good day for it, or now an es­pe­cially good time, do it now, and ad­just your thresh­old for that de­pend­ing on how of­ten you’ve done it re­cently.

A mis­take I of­ten make is to choose ac­tions as if I was as­sum­ing oth­ers know my de­ci­sion al­gorithm and will ex­ploit that to ex­tract all the in­for­ma­tion. Most of the time this is silly.

This brings us to the is­sue of Glo­ma­riza­tion.

Glomarization

Are you har­bor­ing any crim­i­nals? Did you rob a bank? Is there a tap on my phone? Does this make me look fat?

If when the an­swer is no I would tell you no, then re­fus­ing to an­swer is the same as say­ing yes. So if you want to avoid ly­ing, and want to keep se­crets, you need to some­times re­fuse to an­swer ques­tions, to avoid mak­ing re­fus­ing to an­swer too mean­ingful an ac­tion. Eliezer dis­cussed such is­sues re­cently.

This sec­tion was the origi­nal mo­ti­va­tion for writ­ing the poker se­ries up now, but hav­ing writ­ten it, I think a full treat­ment should mostly just be its own thing. And I’m not happy with my abil­ity to ex­plain these con­cepts con­cisely. But a few thoughts here.

The ad­van­tage of fully ex­plicit meta-hon­esty, tel­ling peo­ple ex­actly un­der what con­di­tions you would lie or re­fuse to share in­for­ma­tion, is that it pro­tects a sys­tem of full, re­li­able hon­esty.

The prob­lem with fully ex­plicit meta-hon­esty is that it vastly ex­pands the nec­es­sary amount of Glo­ma­riza­tion to say ex­actly when you would use it.

Eliezer cor­rectly points out that if the Feds ask you where you were last night, your an­swer of ‘I can nei­ther con­firm or deny where I was last night’ is go­ing to sound mighty sus­pi­cious re­gard­less of how of­ten you an­swer that way. Say­ing ‘none of your god­damn busi­ness’ is only marginally bet­ter. Also, let­ting them know that you always re­fuse to an­swer that ques­tion might not be the best way to make them think you’re less sus­pi­cious.

This means both that full Glo­ma­riza­tion isn’t prac­ti­cal un­less (this ac­tu­ally does come up) your re­sponse to a ques­tion can re­li­ably be ‘that’s a trap!’.

How­ever, par­tial Glo­ma­riza­tion is fine. As long as you mix in some re­fus­ing to an­swer when the an­swer wouldn’t hurt you, peo­ple don’t know much. Most im­por­tantly, they don’t know how of­ten you’d re­fuse to an­swer.

If the last five times you’ve re­fused to an­swer if there was a dragon in your garage, there was a dragon in your garage, your re­fusal to an­swer is rather strong ev­i­dence there’s a dragon in your garage.

If it only hap­pened one of the last five times, then there’s cer­tainly a Bayesian up­date one can make, but you don’t know how of­ten there’s a Glamor­iza­tion there, so it’s hard to know how much to up­date on that. The key ques­tion is, what’s the thresh­old where they feel the need to look in your garage? Can you muddy the wa­ters enough to avoid that?

Once you’re do­ing that, it is al­most cer­tainly fine to an­swer ‘no’ when it es­pe­cially mat­ters that they know there isn’t a dragon there, be­cause they don’t know when it’s im­por­tant, or what rule you’re fol­low­ing. If you went and told them ex­actly when you an­swer the ques­tion, it would be bad. But if they’re not sure, it’s fine.

One can com­ple­ment that by un­der­stand­ing how con­ver­sa­tions and top­ics de­velop, and not set your­self up for ques­tions you don’t want to an­swer. If you have a dragon in your garage and don’t want to lie about it or re­veal that it’s there, it’s a re­ally bad idea to talk about the idea of drag­ons in garages. Some­one is go­ing to ask. So when your re­fusal to an­swer would be sus­pi­cious, es­pe­cially when it would be a po­ten­tial sign of a hereti­cal be­lief, the best strat­egy is to not get into po­si­tion to get asked.

Which in turn, means avoid­ing perfectly harm­less things gen­tly, in­visi­bly, with­out say­ing that this is what you’re do­ing. Posts that don’t get writ­ten, state­ments not made, rather than ques­tions not an­swered. As a new prac­ti­tioner of such arts, hard and fast rules are good. As an ex­pert, they only serve to give the game away. ’

Re­mem­ber the illu­sion of trans­parency. Your coun­ter­fac­tual selves would need to act differ­ently. But if no one knows that, it’s not a prob­lem.