Poker ex­ample: (not) de­du­cing someone’s preferences

I’ve shown that it is, the­or­et­ic­ally, im­possible to de­duce the pref­er­ences and ra­tion­al­ity of an agent by look­ing at their ac­tions or policy.

That ar­gu­ment is valid, but feels some­what ab­stract, talk­ing about “fully anti-ra­tional” agents, and other “ob­vi­ously ri­dicu­lous” pref­er­ences.

In this post, I’ll present a simple real­istic ex­ample of hu­man be­ha­viour where their pref­er­ences can­not be de­duced. The ex­ample was de­veloped by Xavier O’rourke.

The mo­tiv­a­tions and be­liefs of a poker player

In this ex­ample, Alice is play­ing Bob at poker, and they are on their last round. Alice might be­lieve that Bob has a bet­ter hand, or a worse one. She may be max­im­ising her ex­pec­ted in­come, or min­im­ising it (why? read on to see). Even un­der ques­tion­ing, it is im­possible to dis­tin­guish an Alice be­lief in Bob hav­ing a worse hand and Alice fol­low­ing a max­im­ising be­ha­viour, from Bob-bet­ter-hand-and-Alice-min­im­ising-in­come. And, sim­il­arly, Bob-worse-hand-and-Alice-min­im­ising-in­come is in­dis­tin­guish­able from Bob-bet­ter-hand-and-Alice-max­im­ising-in­come.

If we want to be spe­cific, ima­gine the we are ob­serving Alice play­ing a game of Texas hol­dem’. Be­fore the river (the fi­nal round of bet­ting), every­one has fol­ded be­sides Alice and Bob. Alice is hold­ing , and the board (the five cards both play­ers have in com­mon) is .

Alice is look­ing at four-of-a-kind in 10′s, and can only lose if Bob holds , giv­ing him a straight flush. For sim­pli­city, as­sume Bob has raised, and Alice can only call or fold—as­sume she’s out of money to re-raise—and Bob can­not re­spond to either, so his ac­tions are ir­rel­ev­ant. He has been play­ing this hand, so far, with great con­fid­ence.

Alice can have two heur­istic mod­els of Bob’s hand. In one model, , she as­sumes that hav­ing spe­cific­ally is very low, so she al­most cer­tainly has the bet­ter hand. In a second model, she notes Bob’s great con­fid­ence, and con­cludes he is quite likely to have that pair.

What does Alice want? Well, one ob­vi­ous goal is to max­im­ise money, with re­ward , lin­ear in money. However, it’s pos­sible that Alice doesn’t care about how much money she’s tak­ing home—she’d prefer to take Bob home in­stead, her re­ward is -- and she thinks that put­ting Bob in a good mood by let­ting him win at poker will make him more re­cept­ive to her ad­vances later in the even­ing. In this case Alice wants to lose as much money as she can in this hand, so, in this spe­cific situ­ation, .

Then the fol­low­ing table rep­res­ent’s Alice’s ac­tion, as a func­tion of her model and re­ward func­tion:

Thus, for ex­ample, if she wants to max­im­ise money () and be­lieves Bob doesn’t have the win­ning hand (), she should call. Sim­il­arly, res­ults in Alice call­ing (be­cause she be­lieves she will lose if both play­ers show their cards, and wants to lose). Con­versely, and res­ult in Alice fold­ing.

Thus ob­serving Alice’s be­ha­viour neither con­strains her be­liefs, nor her pref­er­ences—though it does con­strain the com­bin­a­tion of the two.

Alice’s over­all actions

Can we really not fig­ure what Alice wants here? What about if we just waited to see her pre­vi­ous or sub­sequent be­ha­viour? Or if we simply asked her what she wanted?

Un­for­tu­nately, neither of these may suf­fice. Even if Alice is mainly a money max­im­iser, it’s pos­sible she might take Bob as a con­sol­a­tion prize; even if she was mainly in­ter­ested in Bob, it’s pos­sible that she pre­vi­ously played ag­gress­ively to win money, reas­on­ing that Bob is more likely to sa­vour a fi­nal vic­tory against a worthy-seem­ing op­pon­ent.

As for ask­ing Alice—well, sexual pref­er­ences and poker strategies are areas where hu­mans are in­cred­ibly mo­tiv­ated to lie and mis­lead. Why con­fess to a de­sire that might res­ult in it be­ing im­possible to achieve? Or re­veal how you ana­lyse poker hands in an un­duly hon­est way? Con­versely, hon­esty or double-bluffs are also op­tions.

Thus, it is plaus­ible that Alice’s total be­ha­viour could be identical in the and cases (and in the and cases), not al­low­ing us to dis­tin­guish these. Or at least, not al­low­ing us to dis­tin­guish them with much con­fid­ence.

Ad­ding more details

It might be ob­jec­ted that the prob­lem above is overly nar­row, and that if we ex­pan­ded the space of ac­tions, Alice’s pref­er­ences would be­come clear.

That is likely to be the case; but the space of be­liefs and re­wards was also nar­row. We could al­low Alice to raise as well (maybe with the goal of trick­ing Bob into fold­ing); with three ac­tions, we may be able to dis­tin­guish bet­ter between the four pos­sible pairs. But we can then give Alice more mod­els as to how Bob would re­act, in­creas­ing the space of pos­sib­il­it­ies. We could also con­sider more pos­sible motives for Alice—she might have a risk averse money-lov­ing util­ity, and/​or some mix between and .

It’s there­fore not clear that “ex­pand­ing” the prob­lem, or mak­ing it more real­istic, would make it any easier to de­duce what Alice wants.