Sim­pli­fied Poker Conclusions

Pre­vi­ously: Sim­pli­fied Poker, Sim­pli­fied Poker Strategy

Related (Eliezer Yudkowsky): Meta Hon­esty: Firm­ing Hon­esty Around Its Edge Cases

About forty people sub­mit­ted pro­grams that used ran­dom­iz­a­tion. Several of those ran­dom pro­grams cor­rectly solved for the Nash equi­lib­rium, which did well.

I sub­mit­ted the only de­term­in­istic pro­gram.

I won go­ing away.

I broke even against the Nash pro­grams, ut­terly crushed vul­ner­able pro­grams, and lost a non-trivial amount to only one pro­gram, a re­sound­ing heads-up de­feat handed to me by the only other top-level gamer in the room, fel­low Ma­gic: the Gath­er­ing semi-pro player Eric Phil­lips.

Like me, Eric had an es­cape hatch in his pro­gram that re­versed his de­cisions (rather than re­treat­ing to Nash) if he was los­ing by enough. Un­like me, his ac­tu­ally got im­ple­men­ted – the pro­fessor de­cided that given how well I was go­ing to do any­way, I’d hit the com­plex­ity limit, so my es­cape hatch was left out.

Rather than get into im­ple­ment­a­tion de­tails, or prov­ing the Nash equi­lib­rium, I’ll dis­cuss two things: How few levels people play on, and the mo­tiv­at­ing point: How things are already more dis­tinct and ran­dom than you think they are, and how to take ad­vant­age of that.

Next Level

In the com­ments to the first two posts, most people fo­cused on find­ing the Nash equi­lib­rium. A few people tried to do some­thing that would bet­ter ex­ploit ob­vi­ously stu­pid play­ers, but none that tried to dis­cover the op­pon­ents’ strategy.

The only reason not to play an ex­ploit­able strategy is if you’re wor­ried someone will ex­ploit it!

Con­sider think­ing as hav­ing levels. Level N+1 at­tempts to op­tim­ize against Levels N and be­low, or just Level N.

Level 0 isn’t think­ing or op­tim­iz­ing, so higher levels all crush it, mostly.

Level 1 think­ing pick­ing ac­tions that are gen­er­ic­ally power­ful, likely to lead to good out­comes, without con­sid­er­ing what op­pon­ents might do. Do ‘nat­ural’ things.

Level 2 think­ing con­siders what to do against op­pon­ents us­ing Level 1 think­ing. You try to counter the ‘nat­ural’ ac­tions, and ex­ploit stand­ard be­ha­vi­ors.

Level 3 coun­ters Level 2. You as­sume your op­pon­ents are try­ing to ex­ploit ba­sic be­ha­vi­ors, and at­tempt to ex­ploit those try­ing to do this.

Level 4 coun­ters Level 3. You as­sume your op­pon­ents are try­ing to ex­ploit ex­ploit­at­ive be­ha­vior, and act­ing ac­cord­ingly. So you do what’s best against that.

And so on. Be­ing caught one level be­low your op­pon­ent is death. Be­ing one level ahead is amaz­ing. Two or more levels dif­fer­ent, and strange things hap­pen.

Life is messy. Polit­ical cam­paigns, ma­jor cor­por­a­tion stra­tegic plans, theat­ers of war. The big stuff. A lot of Level 0. Level 1 is in­dustry stand­ard. Level 2 is in­spired, ex­cep­tional. Level 3 is the stuff of le­gend.

In well-defined situ­ations where losers are strongly filtered out, such as tour­na­ments, you can get glim­mers of high level be­ha­vior. But mostly, you get it by chan­ging the view of what Level 1 is. The old Level 2 and Level 3 strategies be­come the new ‘rules of the game’. The brain chunks them into ba­sic ac­tions. Only then can the cycle be­gin again.

Also, ‘get­ting’ someone with Level 3 think­ing risks giv­ing the game away. What level should one be on next time, then?

Ef­fect­ive Randomization

There is a strong in­stinct that whenever pre­dict­able be­ha­vior can be pun­ished, one must ran­dom­ize one’s be­ha­vior.

That’s true. But only from an­other’s point of view. You can’t be pre­dict­able, but that doesn’t mean you need to be ran­dom.

It’s an­other form of il­lu­sion of trans­par­ency. If you think about a prob­lem dif­fer­ently than oth­ers, their at­tempts to pre­dict or model you will get it wrong. The only re­quire­ment is that your de­cision pro­cess is com­plex, and doesn’t re­duce to a simple model.

If you also have dif­fer­ent in­form­a­tion than they do, that’s even bet­ter.

When ana­lyz­ing the hand his­tor­ies, I know what cards I was dealt, and use that to de­duce what cards my op­pon­ent likely held, and in turn guess their be­ha­vi­ors. Thus, my op­pon­ent likely has no clue either what pro­cess I’m us­ing, how I im­ple­men­ted it, or what data I’m feed­ing into it. All of that is ef­fect­ive ran­dom­iz­a­tion.

If that re­duces to me al­ways bet­ting with a 1, they might catch on even­tu­ally. But since I’m con­stantly re-eval­u­at­ing what they’re do­ing, and re­act­ing ac­cord­ingly, on an im­possible-to-pre­dict sched­ule, such catch­ing on might end up back­fir­ing. It’s the same at a hu­man poker table. If you’re good enough at read­ing people to fig­ure out what I’m think­ing and stay one step ahead, I need to re­treat to Nash, but that’s rare. Mostly, I only need to worry, at most, if my ac­tions are ef­fect­ively do­ing some­thing simple and easy to model.

Play­ing the same ex­act scen­arios, or with the same ex­act people, or both, for long enough, both in­creases the amount of data avail­able for ana­lysis, and re­duces the ran­dom­ness be­hind it. Even­tu­ally, such tac­tics stop work­ing. But it takes a while, and the more you care about long his­tor­ies in non-ob­vi­ous ways, the longer it will take.

Rather than be ac­tu­ally ran­dom, in­stead one ad­justs when one’s be­ha­vior has suf­fi­ciently de­vi­ated from what would look ran­dom, such that oth­ers will likely ad­just to ac­count for it. That ad­just­ment, too, need not be ran­dom.

Rush­ing into do­ing things to mix up your play, be­fore oth­ers have any data to work with, only leaves value on the table.

One strong strategy when one needs to mix it up is to do what the de­tails fa­vor. Thus, if there’s some­thing you need to oc­ca­sion­ally do, and today is an un­usu­ally good day for it, or now an es­pe­cially good time, do it now, and ad­just your threshold for that de­pend­ing on how of­ten you’ve done it re­cently.

A mis­take I of­ten make is to choose ac­tions as if I was as­sum­ing oth­ers know my de­cision al­gorithm and will ex­ploit that to ex­tract all the in­form­a­tion. Most of the time this is silly.

This brings us to the is­sue of Glo­mar­iz­a­tion.

Glomarization

Are you har­bor­ing any crim­in­als? Did you rob a bank? Is there a tap on my phone? Does this make me look fat?

If when the an­swer is no I would tell you no, then re­fus­ing to an­swer is the same as say­ing yes. So if you want to avoid ly­ing, and want to keep secrets, you need to some­times re­fuse to an­swer ques­tions, to avoid mak­ing re­fus­ing to an­swer too mean­ing­ful an ac­tion. Eliezer dis­cussed such is­sues re­cently.

This sec­tion was the ori­ginal mo­tiv­a­tion for writ­ing the poker series up now, but hav­ing writ­ten it, I think a full treat­ment should mostly just be its own thing. And I’m not happy with my abil­ity to ex­plain these con­cepts con­cisely. But a few thoughts here.

The ad­vant­age of fully ex­pli­cit meta-hon­esty, telling people ex­actly un­der what con­di­tions you would lie or re­fuse to share in­form­a­tion, is that it pro­tects a sys­tem of full, re­li­able hon­esty.

The prob­lem with fully ex­pli­cit meta-hon­esty is that it vastly ex­pands the ne­ces­sary amount of Glo­mar­iz­a­tion to say ex­actly when you would use it.

Eliezer cor­rectly points out that if the Feds ask you where you were last night, your an­swer of ‘I can neither con­firm or deny where I was last night’ is go­ing to sound mighty sus­pi­cious re­gard­less of how of­ten you an­swer that way. Say­ing ‘none of your god­damn busi­ness’ is only mar­gin­ally bet­ter. Also, let­ting them know that you al­ways re­fuse to an­swer that ques­tion might not be the best way to make them think you’re less sus­pi­cious.

This means both that full Glo­mar­iz­a­tion isn’t prac­tical un­less (this ac­tu­ally does come up) your re­sponse to a ques­tion can re­li­ably be ‘that’s a trap!’.

However, par­tial Glo­mar­iz­a­tion is fine. As long as you mix in some re­fus­ing to an­swer when the an­swer wouldn’t hurt you, people don’t know much. Most im­port­antly, they don’t know how of­ten you’d re­fuse to an­swer.

If the last five times you’ve re­fused to an­swer if there was a dragon in your gar­age, there was a dragon in your gar­age, your re­fusal to an­swer is rather strong evid­ence there’s a dragon in your gar­age.

If it only happened one of the last five times, then there’s cer­tainly a Bayesian up­date one can make, but you don’t know how of­ten there’s a Glam­or­iz­a­tion there, so it’s hard to know how much to up­date on that. The key ques­tion is, what’s the threshold where they feel the need to look in your gar­age? Can you muddy the wa­ters enough to avoid that?

Once you’re do­ing that, it is al­most cer­tainly fine to an­swer ‘no’ when it es­pe­cially mat­ters that they know there isn’t a dragon there, be­cause they don’t know when it’s im­port­ant, or what rule you’re fol­low­ing. If you went and told them ex­actly when you an­swer the ques­tion, it would be bad. But if they’re not sure, it’s fine.

One can com­ple­ment that by un­der­stand­ing how con­ver­sa­tions and top­ics de­velop, and not set your­self up for ques­tions you don’t want to an­swer. If you have a dragon in your gar­age and don’t want to lie about it or re­veal that it’s there, it’s a really bad idea to talk about the idea of dragons in gar­ages. Someone is go­ing to ask. So when your re­fusal to an­swer would be sus­pi­cious, es­pe­cially when it would be a po­ten­tial sign of a heretical be­lief, the best strategy is to not get into po­s­i­tion to get asked.

Which in turn, means avoid­ing per­fectly harm­less things gently, in­vis­ibly, without say­ing that this is what you’re do­ing. Posts that don’t get writ­ten, state­ments not made, rather than ques­tions not answered. As a new prac­ti­tioner of such arts, hard and fast rules are good. As an ex­pert, they only serve to give the game away. ’

Re­mem­ber the il­lu­sion of trans­par­ency. Your coun­ter­fac­tual selves would need to act dif­fer­ently. But if no one knows that, it’s not a prob­lem.