Monty Hall in the Wild

Cross-pos­ted from Putanu­

I vis­ited a friend yes­ter­day, and after we fin­ished our first bottle of Mer­lot I brought up the Monty Hall prob­lem, as one does. My friend con­fessed that she has never heard of the prob­lem. Here’s how it goes:

You are faced with a choice of three doors. Be­hind one of them is a shiny pile of uti­lons, while the other two are hid­ing avo­ca­dos that are just start­ing to get rot­ten and maybe you can con­vince your­self that they’re still OK to eat but you’ll im­me­di­ately re­gret it if you try. The ori­ginal for­mu­la­tion talks about a car and two goats, which al­ways con­fused me be­cause goats are bet­ter for ra­cing than cars.

Any­way, you point to Door A, at which point the host of the show (yeah, it’s a show, you’re on TV) opens one of the other doors to re­veal an al­most-rot­ten avo­cado. The host knows what hides be­hind each door, al­ways of­fers the switch, and never opens the one with the prize be­cause that would make for bor­ing TV.

You now have the op­tion of stick­ing with Door A or switch­ing to the third door that wasn’t opened. Should you switch?

After my friend figured out the an­swer she com­men­ted that:

  1. The cor­rect an­swer is ob­vi­ous.

  2. The prob­lem is pretty bor­ing.

  3. Monty Hall isn’t rel­ev­ant to any­thing you would ever come across in real life.

Wrong, wrong, and wrong.

Ac­cord­ing to Wiki­pe­dia, only 13% of people fig­ure out that switch­ing doors im­proves your chance of win­ning from 13 to 23. But even more in­ter­est­ing is the fact that an in­cred­ible num­ber of people re­main un­con­vinced even after see­ing sev­eral proofs, sim­u­la­tions, and demon­stra­tions. The wrong an­swer, that the doors have an equal 12 chance of avo­cado, is so in­tu­it­ively ap­peal­ing that even edu­cated people can­not over­come that in­tu­ition with math­em­at­ical reason.

I’ve writ­ten a lot re­cently about de­coup­ling. At the heart of de­coup­ling is the abil­ity to over­ride an in­tu­it­ive Sys­tem 1 an­swer with a Sys­tem 2 an­swer that’s ar­rived at by ap­ply­ing lo­gic and rules. This abil­ity to over­ride is what’s meas­ured by the Cog­nit­ive Re­flec­tion Test, the stand­ard test of ra­tion­al­ity. Re­mark­ably, many people fail to im­prove on the CRT even after tak­ing it mul­tiple times.

When ra­tion­al­ists talk about “me” and “my brain”, the former refers to their Sys­tem 2 and the lat­ter to their Sys­tem 1 and their un­con­scious. “I wanted to get some work done, but my brain­de­cided it needs to scroll through Twit­ter for an hour.” But in al­most any other group, “my brain” is Sys­tem 2, and what people identify with is their in­tu­ition.

Ra­tion­al­ists of­ten un­der­es­tim­ate the sheer in­ab­il­ity of many people to take this first step to­wards ra­tion­al­ity, of real­iz­ing that the first an­swer that pops into their heads could be wrong on re­flec­tion. I’ve talked to people who were ut­terly con­fused by the sug­ges­tion that their gut feel­ings may not cor­res­pond per­fectly to facts. For someone who puts a lot of weight on Sys­tem 2, it’s re­mark­able to see people for whom it may as well not ex­ist.

You know who does really well on the Monty Hall prob­lem? Pi­geons. At least, pi­geons quickly learn to switch doors when the game is re­peated mul­tiple times over 30 days and they can ob­serve that switch­ing doors is twice as likely to yield the prize.

This isn’t some bias in fa­vor of switch­ing either, be­cause when the con­di­tion is re­versed so the prize is made to be twice as likely to ap­pear be­hind the door that was ori­gin­ally chosen, the pi­geons up­date against switch­ing just as quickly:

Re­mark­ably, hu­mans don’t up­date at all on the it­er­ated game. When switch­ing is bet­ter, a third of people re­fuse to switch no mat­ter how long the game is re­peated:

And when switch­ing is worse, a third of hu­mans keep switch­ing:

I first saw this chart when my wife was giv­ing a talk on pi­geon cog­ni­tion (I really mar­ried well). I im­me­di­ately be­came curi­ous about the ex­act sort of hu­man who can yield a chart like that. The study these are taken from is titled Are Birds Smarter than Mathem­aticians?, but the re­spond­ents were not math­em­aticians at all.

Fail­ing to im­prove one iota in the re­peated game re­quires a par­tic­u­lar sort of dys­ra­tionalia, where you’re so cer­tain of your math­em­at­ical in­tu­ition that mount­ing evid­ence to the con­trary only causes you to double down on your wrong­ness. Other stud­ies have shown that people who don’t know math at all, like little kids, quickly up­date to­wards switch­ing. The re­luct­ance to switch can only come from an ex­treme Dun­ning-Kruger ef­fect. These are people whose in­ab­il­ity to do math is matched only by their cer­tainty in their own math­em­at­ical skill.

This little screen­shot tells you everything you need to know about aca­demic psy­cho­logy:

  1. The study had a sample size of 13, but I bet that the grant pro­posal still claimed that the power was 80%.

  2. The title mis­lead­ingly talked about math­em­aticians, even though not a single math­em­atician was harmed when per­form­ing the ex­per­i­ment.

  3. A big chunk of the psy­cho­logy un­der­grads could neither fig­ure out the op­timal strategy nor learn it from dozens of re­peated games..

Monty Hall is a corner­stone puzzle for ra­tion­al­ity not just be­cause it has an in­tu­it­ive an­swer that’s wrong, but also be­cause it demon­strates a key of Bayesian think­ing: the im­port­ance of coun­ter­fac­tu­als. Bayes’ law says that you must up­date not only on what ac­tu­ally happened, but also on what could have happened.

There are no coun­ter­fac­tu­als rel­ev­ant to Door A, be­cause it never could have been opened. In the world where Door A con­tains the prize noth­ing hap­pens to it that doesn’t hap­pen in the world where it smells of avo­cado. That’s why ob­serving that the host opens Door B doesn’t change the prob­ab­il­ity of Door A be­ing a win­ner: it stays at 13.

But for Door C, you knew that there was a chance it would be opened, but it wasn’t. And the chance of Door C be­ing opened de­pends on what it con­tains: 0% of be­ing opened if it hides the prize, 75% if it’s the avo­cado. Even be­fore cal­cu­lat­ing the num­bers ex­actly, this tells you that ob­serving the door that is opened should make you up­date the odds of Door C.

Con­tra my friend, Monty Hall-esque lo­gic shows up in a lot of places if you know how to no­tice it.

Here’s an in­tu­it­ive ex­ample: you enter a yearly per­form­ance re­view with your cur­mudgeonly boss who loves to point out your faults. She com­plains that you al­ways sub­mit your ex­pense re­ports late, which an­noys the HR team. Is this good or bad news?

It’s ex­cel­lent news! Your boss is a lot more likely to com­plain about some minor de­tail if you’re do­ing great on everything else, like ac­tu­ally get­ting the work done with your team. The fact that she showed you a stink­ing avo­cado be­hind the “ex­pense re­port timeli­ness” door means that the other doors are likely praise­worthy.

Another ex­ample – there are 9 French res­taur­ants in town: Chez Arnaud, Chez Bern­ard, Chez Claude etc. You ask your friend who has tried them all which one is the best, and he in­forms you that it’s Chez Ja­cob. What does this mean for the other res­taur­ants?

It means that the other eight res­taur­ants are slightly worse than you ini­tially thought. The fact that your friend could’ve picked any of them as the best but didn’t is evid­ence against them be­ing great.

To put some num­bers on it, your ini­tial model could be that the nine res­taur­ants are equally likely to oc­cupy any per­cent­ile of qual­ity among French res­taur­ants in the coun­try. If per­cent­iles are con­fus­ing, ima­gine that any res­taur­ant is equally likely to get any score of qual­ity between 0 and 100. You’d ex­pect any of the nine to be the 50th per­cent­ile res­taur­ant, or to score 50100, on av­er­age. Learn­ing that Chez Ja­cob is the best means that you should ex­pect it to score 90, since the max­imum of N in­de­pend­ent vari­ables dis­trib­uted uni­formly over [0,1] has an ex­pec­ted value of N/​N+1. You should up­date that the av­er­age of the eight re­main­ing res­taur­ants scores 45.

The short­cut to fig­ur­ing that out is con­sid­er­ing some prob­ab­il­ity or quant­ity that is con­served after you made your ob­ser­va­tion. In Monty Hall, the quant­ity that is con­served is the 23 chance that the prize is be­hind one of the doors B or C (be­cause the chance of Door A is 13 and is in­de­pend­ent of the ob­ser­va­tion that an­other door is opened). After Door B is opened, the chance of it con­tain­ing the prize goes down by 13, from 13 to 0. This means that the chance of Door C should in­crease by 13, from 13 to 23.

Sim­il­arly, be­ing told that Chez Ja­cob is the best up­graded it from 50 to 90, a jump of 40 per­cent­iles or points. But your ex­pect­a­tion that the av­er­age of all nine res­taur­ants in town is 50 shouldn’t change, as­sum­ing that your friend was go­ing to pick out one best res­taur­ant re­gard­less of the over­all qual­ity. Since Chez Ja­cob gained 40 points, the other eight res­taur­ants have to lose 5 points each to keep the av­er­age the same. Thus, your pos­terior ex­pect­a­tion for them went down from 50 to 45.

Here’s one more ex­ample of coun­ter­fac­tual-based think­ing. It’s not par­al­lel to Monty Hall at first glance, but the same Bayesian lo­gic un­der­pins both of them.

I’ve spoken to two wo­men who write in­ter­est­ing things but are hes­it­ant to post them on­line be­cause people tell them they suck and call them $&#@s. They asked me how I deal with be­ing called a $&#@ on­line. My an­swer is that I real­ized that most of the time when someone calls me a $&#@ on­line I should up­date not that I may be a $&#@, but only that I’m get­ting more fam­ous.

There are two kinds of people who can pos­sibly re­spond to some­thing I’ve writ­ten by telling me I’m a $&#@:

  1. People who read all my stuff and are usu­ally pos­it­ive, but in this case thought I was a $&#@.

  2. People who go around call­ing oth­ers $&#@s on­line, and in this case just happened to click on Putanu­monit.

In the first case, the coun­ter­fac­tual to be­ing called a $&#@ is get­ting a com­pli­ment, which should make think that I $&#@ed up with that par­tic­u­lar es­say. When I get neg­at­ive com­ments from reg­u­lar read­ers, I up­date.

In the second case, the coun­ter­fac­tual to me be­ing called a $&#@ is simply someone else be­ing called a $&#@. My up­date is simply that my es­say has got­ten widely shared, which is great.

For ex­ample, here’s a com­ment that starts off with “Fuck you Aspie scum”. The com­ment has noth­ing to do with my ac­tual post, I’m pretty sure that its au­thor just copy-pastes it on ra­tion­al­ist blogs that he hap­pens to come across. Given this, I found the com­ment to be a pos­it­ive sign of the ex­pand­ing reach of Putanu­monit. It did noth­ing to make me worry that I’m an Aspie scum who should be fucked.

I find this sort of Bayesian what’s-the-coun­ter­fac­tual think­ing im­mensely valu­able and widely ap­plic­able. I think that it’s a teach­able skill that any­one can im­prove on with prac­tice, even if it comes easier to some than to oth­ers.

Only pi­geons are born mas­ter Bayesians.