So You Think You’re a Bayesian? The Natural Mode of Probabilistic Reasoning

Re­lated to: The Con­junc­tion Fal­lacy, Con­junc­tion Controversy

The heuris­tics and bi­ases re­search pro­gram in psy­chol­ogy has dis­cov­ered many differ­ent ways that hu­mans fail to rea­son cor­rectly un­der un­cer­tainty. In ex­per­i­ment af­ter ex­per­i­ment, they show that we use heuris­tics to ap­prox­i­mate prob­a­bil­ities rather than mak­ing the ap­pro­pri­ate calcu­la­tion, and that these heuris­tics are sys­tem­at­i­cally bi­ased. How­ever, a tweak in the ex­per­i­ment pro­to­cols seems to re­move the bi­ases al­to­gether and shed doubt on whether we are ac­tu­ally us­ing heuris­tics. In­stead, it ap­pears that the er­rors are sim­ply an ar­ti­fact of how our brains in­ter­nally store in­for­ma­tion about un­cer­tainty. The­o­ret­i­cal con­sid­er­a­tions sup­port this view.

EDIT: The view pre­sented here is con­tro­ver­sial in the heuris­tics and bi­ases liter­a­ture; see Un­named’s com­ment on this post be­low.

EDIT 2: The au­thor no longer holds the views pre­sented in this post. See this com­ment.

A com­mon ex­am­ple of the failure of hu­mans to rea­son cor­rectly un­der un­cer­tainty is the con­junc­tion fal­lacy. Con­sider the fol­low­ing ques­tion:

Linda is 31 years old, sin­gle, out­spo­ken and very bright. She ma­jored in philos­o­phy. As a stu­dent, she was deeply con­cerned with is­sues of dis­crim­i­na­tion and so­cial jus­tice, and also par­ti­ci­pated in antinu­clear demon­stra­tions.

What is the prob­a­bil­ity that Linda is:

(a) a bank teller

(b) a bank tel­ler and ac­tive in the fem­i­nist movement

In a repli­ca­tion by Gigeren­zer, 91% of sub­jects rank (b) as more prob­a­ble than (a), say­ing that it is more likely that Linda is ac­tive in the fem­i­nist move­ment AND a bank tel­ler than that Linda is sim­ply a bank tel­ler (1993). The con­junc­tion rule of prob­a­bil­ity states that the prob­a­bil­ity of two things be­ing true is less than or equal to the prob­a­bil­ity of one of those things be­ing true. For­mally, P(A & B) ≤ P(A). So this ex­per­i­ment shows that peo­ple vi­o­late the con­junc­tion rule, and thus fail to rea­son cor­rectly un­der un­cer­tainty. The rep­re­sen­ta­tive heuris­tic has been pro­posed as an ex­pla­na­tion for this phe­nomenon. To use this heuris­tic, you eval­u­ate the prob­a­bil­ity of a hy­poth­e­sis by com­par­ing how “al­ike” it is to the data. Some­one us­ing the rep­re­sen­ta­tive heuris­tic looks at the Linda ques­tion and sees that Linda’s char­ac­ter­is­tics re­sem­ble those of a fem­i­nist bank tel­ler much more closely than that of just a bank tel­ler, and so they con­clude that Linda is more likely to be a fem­i­nist bank tel­ler than a bank tel­ler.

This is the stan­dard story, but are peo­ple re­ally us­ing the rep­re­sen­ta­tive heuris­tic in the Linda prob­lem? Con­sider the fol­low­ing re­word­ing of the ques­tion:

Linda is 31 years old, sin­gle, out­spo­ken and very bright. She ma­jored in philos­o­phy. As a stu­dent, she was deeply con­cerned with is­sues of dis­crim­i­na­tion and so­cial jus­tice, and also par­ti­ci­pated in antinu­clear demon­stra­tions.

There are 100 peo­ple who fit the de­scrip­tion above. How many of them are:

(a) bank tellers

(b) bank tel­lers and ac­tive in the fem­i­nist movement

No­tice that the ques­tion is now strictly in terms of fre­quen­cies. Un­der this ver­sion, only 22% of sub­jects rank (b) as more prob­a­ble than (a) (Gigeren­zer, 1993). The only thing that changed is the ques­tion that is asked; the de­scrip­tion of Linda (and the 100 peo­ple) re­mains un­changed, so the rep­re­sen­ta­tive­ness of the de­scrip­tion for the two groups should re­main un­changed. Thus peo­ple are not us­ing the rep­re­sen­ta­tive heuris­tic—at least not in gen­eral.

Tver­sky and Kah­ne­man, cham­pi­ons and founders of the heuris­tics and bi­ases re­search pro­gram, ac­knowl­edged that the con­junc­tion fal­lacy can be miti­gated by chang­ing the word­ing of the ques­tion (1983, pg 309), but this isn’t the only anomaly. Con­sider an­other prob­lem:

If a test to de­tect a dis­ease whose prevalence is 1/​1000 has a false pos­i­tive rate of 5%, what is the chance that a per­son found to have a pos­i­tive re­sult ac­tu­ally has the dis­ease, as­sum­ing you know noth­ing about the per­son’s symp­toms or signs?

Us­ing Bayes’ the­o­rem, the cor­rect an­swer is .02, or 2%. In one repli­ca­tion, only 12% of sub­jects cor­rectly calcu­lated this prob­a­bil­ity. In these ex­per­i­ments, the most com­mon wrong an­swer given is usu­ally .95, or 95% (Gigeren­zer, 1993). This is what’s known as the base rate fal­lacy be­cause the er­ror comes from ig­nor­ing the “base rate” of the dis­ease in the pop­u­la­tion. In­tu­itively, if ab­solutely no one has the dis­ease, it doesn’t mat­ter what the test says—you still wouldn’t think you had the dis­ease.

Now con­sider the same ques­tion framed in terms of rel­a­tive fre­quen­cies.

One out of 1000 Amer­i­cans has dis­ease X. A test has been de­vel­oped to de­tect when a per­son has dis­ease X. Every time the test is given to a per­son who has the dis­ease, the test comes out pos­i­tive. But some­times the test also comes out pos­i­tive when it is given to a per­son who is com­pletely healthy. Speci­fi­cally, out of ev­ery 1000 peo­ple who are perfectly healthy, 50 of them test pos­i­tive for the dis­ease.

Imag­ine that we have as­sem­bled a ran­dom sam­ple of 1000 Amer­i­cans. They were se­lected by a lot­tery. Those who con­ducted the lot­tery had no in­for­ma­tion about the health sta­tus of any of these peo­ple. How many peo­ple who test pos­i­tive for the dis­ease will ac­tu­ally have the dis­ease?

_____ out of _____.

Us­ing this ver­sion of the ques­tion, 76% of sub­jects an­swered cor­rectly with 1 out of 50. In­struct­ing sub­jects to vi­su­al­ize fre­quen­cies in graphs in­creases this per­centage to 92% (Gigeren­zer, 1993). Again, re-fram­ing the ques­tion in terms of rel­a­tive fre­quen­cies rather than (sub­jec­tive) prob­a­bil­ities re­sults in im­proved perfor­mance on the test.

Con­sider yet an­other typ­i­cal ques­tion in these ex­per­i­ments:

Which city has more in­hab­itants?

(a) Hyderabad

(b) Islamabad

How con­fi­dent are you that your an­swer is cor­rect?

50%, 60%, 70%, 80%, 90%, 100%

Ac­cord­ing to Gigeren­zer (1993),

The ma­jor find­ing of some two decades of re­search is the fol­low­ing: In all the cases where sub­jects said, “I am 100% con­fi­dent that my an­swer is cor­rect,” the rel­a­tive fre­quency of cor­rect an­swers was only about 80%; in all the cases where sub­jects said, “I am 90% con­fi­dent” the rel­a­tive fre­quency of cor­rect an­swers was only about 75%, when sub­jects said “I am 80% con­fi­dent” the rel­a­tive fre­quency of cor­rect an­swers was only about 65%, and so on.

This is called over­con­fi­dence bias. A Bayesian might say that you aren’t cal­ibrated. In any case, it’s gen­er­ally frowned upon by both statis­ti­cal camps. If when you say you’re 90% con­fi­dent and you’re only right 80% of the time, why not just say you’re 80% con­fi­dent? But con­sider a differ­ent ex­per­i­men­tal setup. In­stead of only ask­ing sub­jects one gen­eral knowl­edge ques­tion like the Hy­der­abad-Is­lam­abad ques­tion above, ask them 50; and in­stead of ask­ing them how con­fi­dent they are that their an­swer is cor­rect ev­ery time, ask them at the end how many they think they an­swered cor­rectly. If peo­ple are bi­ased in the way that over­con­fi­dence bias says they are there should be no differ­ence be­tween the two ex­per­i­ments.

First, Gigeren­zer repli­cated the origi­nal ex­per­i­ments, show­ing an over­con­fi­dence bias of 13.8% - that is, sub­jects were an ad­di­tional 13.8% more con­fi­dent than the true rel­a­tive fre­quency of cor­rect an­swers, on av­er­age. For ex­am­ple, if they claimed a con­fi­dence of 90%, on av­er­age they would an­swer cor­rectly 76.2% of the time. Us­ing the 50 ques­tion treat­ment, over­con­fi­dence bi­ased dropped to −2.4%! In a sec­ond repli­ca­tion, the con­trol was 15.4% and the treat­ment was −4.2% (1993). Note that −2.4% and −4.2% are likely not sig­nifi­cantly differ­ent from 0, so don’t in­ter­pret that as un­der­con­fi­dence bias. Once the prob­a­bil­ity judg­ment was framed in terms of rel­a­tive fre­quen­cies, the bias ba­si­cally dis­ap­peared.

So in all three ex­per­i­ments, the stan­dard re­sults of the heuris­tics and bi­ases pro­gram fall once the prob­lem is re­cast in terms of rel­a­tive fre­quen­cies. Hu­mans don’t sim­ply use heuris­tics; some­thing else more com­pli­cated is go­ing on. But the im­por­tant ques­tion is, of course, what else? To an­swer that, we need to take a de­tour through in­for­ma­tion rep­re­sen­ta­tion. Any com­puter—and the brain is just a very difficult to un­der­stand com­puter—has to rep­re­sent its in­for­ma­tion sym­bol­i­cally. The prob­lem is that there are usu­ally many ways to rep­re­sent the same in­for­ma­tion. For ex­am­ple, 31, 11111, and XXXI all rep­re­sent the same num­ber us­ing differ­ent sys­tems of rep­re­sen­ta­tion. Aside from the ob­vi­ous vi­sual differ­ences, sys­tems of rep­re­sen­ta­tion also differ based on how easy they are to use for a va­ri­ety of op­er­a­tions. If this doesn’t seem ob­vi­ous, as Gigeren­zer says, try long di­vi­sion us­ing ro­man nu­mer­als (1993). Cru­cially, this difficulty is rel­a­tive to the com­puter at­tempt­ing to perform the op­er­a­tions. Your calcu­la­tor works great in bi­nary, but your brain works bet­ter when things are rep­re­sented vi­su­ally.

What does the rep­re­sen­ta­tion of in­for­ma­tion have to do with the ex­per­i­men­tal re­sults above? Well, let’s take an­other de­tour—this time through the philos­o­phy of prob­a­bil­ity. As most of you already know, there the two most com­mon po­si­tions are fre­quen­tism and Bayesi­anism. I won’t get into the de­tails of ei­ther po­si­tion be­yond what is rele­vant, so if you’re un­aware of the differ­ence and are in­ter­ested click the links. Ac­cord­ing to the Bayesian po­si­tion, all prob­a­bil­ities are sub­jec­tive de­grees of be­lief. Don’t worry about the sense in which prob­a­bil­ities are sub­jec­tive, just fo­cus on the de­grees of be­lief part. A Bayesian is com­fortable as­sign­ing a prob­a­bil­ity to any propo­si­tion you can come up with. Some Bayesi­ans don’t even care if the propo­si­tion is co­her­ent.

Fre­quen­tists are differ­ent beasts al­to­gether. For a fre­quen­tist, the prob­a­bil­ity of an event hap­pen­ing is its rel­a­tive fre­quency in some well defined refer­ence class. A use­ful though not en­tirely ac­cu­rate way to think about fre­quen­tist prob­a­bil­ity is that there must be a nu­mer­a­tor and a de­nom­i­na­tor in or­der to get a prob­a­bil­ity. The refer­ence class of events you are con­sid­er­ing pro­vides the de­nom­i­na­tor (the to­tal num­ber of events), and the par­tic­u­lar event you are con­sid­er­ing pro­vides the nu­mer­a­tor (the num­ber of times that par­tic­u­lar event oc­curs in the refer­ence class). If you flip a coin 100 times and get 37 heads and are in­ter­ested in heads, the refer­ence class is coin flips. Then the prob­a­bil­ity of flip­ping a coin and get­ting heads is 37100.1 Key to all of this is that the fre­quen­tist thinks there is no such thing as the prob­a­bil­ity of a sin­gle event hap­pen­ing with­out refer­ring to some refer­ence class. So re­turn­ing to the Linda prob­lem, there is no such thing as a fre­quen­tist prob­a­bil­ity that Linda is a bank tel­ler, or a bank tel­ler and ac­tive in the fem­i­nist move­ment. But there is a prob­a­bil­ity that, out of 100 peo­ple who have the same de­scrip­tion as Linda, a ran­domly se­lected per­son is a bank tel­ler, or a bank tel­ler and ac­tive in the fem­i­nist move­ment.

In ad­di­tion to the var­i­ous philo­soph­i­cal differ­ences be­tween the Bayesi­ans and fre­quen­tists, the two differ­ent schools also nat­u­rally lead to two differ­ent ways of rep­re­sent­ing the in­for­ma­tion con­tained in prob­a­bil­ities. Since all the fre­quen­tist cares about is rel­a­tive fre­quen­cies, the nat­u­ral way to rep­re­sent prob­a­bil­ities in her mind is through, well, fre­quen­cies. The ac­tual num­ber rep­re­sent­ing the prob­a­bil­ity (e.g. p=.23) can always be calcu­lated later as an af­terthought. The Bayesian ap­proach, on the other hand, leads to think­ing in terms of per­centages. If prob­a­bil­ity is just a de­gree of be­lief, why not rep­re­sent it as such with, say, a num­ber be­tween 0 and 1? A “nat­u­ral fre­quen­tist” would store all prob­a­bil­is­tic in­for­ma­tion as fre­quen­cies, care­fully count­ing each time an event oc­curs, while a “nat­u­ral Bayesian” would store it as a sin­gle num­ber—a per­centage—to be up­dated later us­ing Bayes’ the­o­rem as in­for­ma­tion comes in. It wouldn’t be sur­pris­ing if the nat­u­ral fre­quen­tist had trou­ble op­er­at­ing with Bayesian prob­a­bil­ities. She thinks in terms of fre­quen­cies, but a sin­gle num­ber isn’t a fre­quency—it has to be con­verted to a fre­quency in some way that al­lows her to keep count­ing events ac­cu­rately if she wants to use this in­for­ma­tion.

So if it isn’t ob­vi­ous by now, we’re nat­u­ral fre­quen­tists! How many of you thought you were Bayesi­ans?2 Gigeren­zer’s ex­per­i­ments show that chang­ing the rep­re­sen­ta­tion of un­cer­tainty from prob­a­bil­ities to fre­quen­cies dras­ti­cally al­ters the re­sults, mak­ing hu­mans ap­pear much bet­ter at statis­ti­cal rea­son­ing than pre­vi­ously thought. It’s not that we use heuris­tics that are sys­tem­at­i­cally bi­ased, our na­tive ar­chi­tec­ture for rep­re­sent­ing un­cer­tainty is just bet­ter at work­ing with fre­quen­cies. When un­cer­tainty isn’t rep­re­sented us­ing fre­quen­cies, our brains have trou­ble and fail in ap­par­ently pre­dictable ways. To any­one who had Bayes’ the­o­rem in­tu­itively ex­plained to them, it shouldn’t be all that sur­pris­ing that we’re nat­u­ral fre­quen­tists. How does Eliezer in­tu­itively ex­plain Bayes’ the­o­rem? By work­ing through ex­am­ples us­ing rel­a­tive fre­quen­cies. This is also a rel­a­tively com­mon tac­tic in un­der­grad­u­ate statis­tics text­books, though it may only be be­cause un­der­grad­u­ates typ­i­cally are taught only the fre­quen­tist ap­proach to prob­a­bil­ity.

So the heuris­tics and bi­ases pro­gram doesn’t cat­a­log the var­i­ous ways that we fail to rea­son cor­rectly un­der un­cer­tainty, but it does cat­a­log the var­i­ous ways we rea­son in­cor­rectly about prob­a­bil­ities that aren’t in our na­tive rep­re­sen­ta­tion. This could be be­cause of our na­tive ar­chi­tec­ture just not han­dling al­ter­nate rep­re­sen­ta­tions of prob­a­bil­ity effec­tively, or it could be be­cause when our na­tive ar­chi­tec­ture starts hav­ing trou­ble, our brains au­to­mat­i­cally re­sort to us­ing the heuris­tics Tver­sky and Kah­ne­man were talk­ing about. The lat­ter seems more plau­si­ble to me in light of the other ways the brain ap­prox­i­mates when it is forced to, but I’m still fairly un­cer­tain. Gigeren­zer has his own ex­pla­na­tion that unifies the two do­mains un­der a spe­cific the­ory of nat­u­ral fre­quen­tism and has performed fur­ther ex­per­i­ments to back it up. He calls his ex­pla­na­tion a the­ory of prob­a­bil­is­tic men­tal mod­els.3 I don’t com­pletely un­der­stand Gigeren­zer’s the­ory and his ex­tra ev­i­dence seems to equally sup­port the hy­poth­e­sis that our brains are us­ing heuris­tics when prob­a­bil­ities aren’t rep­re­sented as fre­quen­cies, but I will say that Gigeren­zer’s the­ory does have el­e­gance go­ing for it. Cap­tur­ing both groups of phe­nom­ena with a unified the­ory makes Oc­cam smile.

Th­ese ex­per­i­ments aren’t the only rea­son to be­lieve that we’re ac­tu­ally pretty good at rea­son­ing un­der un­cer­tainty or that we’re nat­u­ral fre­quen­tists; there are the­o­ret­i­cal rea­sons as well. First, con­sider evolu­tion­ary the­ory. If lower or­der an­i­mals are de­cent at statis­ti­cal rea­son­ing, we would prob­a­bly ex­pect that hu­mans are good as well since we all evolved from the same source. It is pos­si­ble that a lower or­der species de­vel­oped its statis­ti­cal rea­son­ing ca­pa­bil­ities af­ter its evolu­tion­ary path di­verged from the an­ces­tors of hu­mans, or that statis­ti­cal rea­son­ing be­came less im­por­tant for hu­mans or their re­cent an­ces­tors and thus evolu­tion com­mit­ted less re­sources to the pro­cess. But the abil­ity to rea­son un­der un­cer­tainty seems so use­ful, and if any species has the men­tal ca­pac­ity to do it, we would ex­pect hu­mans to with their large, adept brains. Gigeren­zer sum­ma­rizes the ev­i­dence across species (1993):

Bum­ble­bees, birds, rats, and ants all seem to be good in­tu­itive statis­ti­ci­ans, highly sen­si­tive to changes in fre­quency dis­tri­bu­tions in their en­vi­ron­ments, as re­cent re­search in for­ag­ing be­hav­ior in­di­cates (Gal­lis­tel, 1990; Real & Caraco, 1986). From sea snails to hu­mans, as John Stad­don (1988) ar­gued, the learn­ing mechanisms re­spon­si­ble for ha­bit­u­a­tion, sen­si­ti­za­tion, and clas­si­cal and op­er­ant con­di­tion­ing can be de­scribed in terms of statis­ti­cal in­fer­ence ma­chines. Read­ing this liter­a­ture, one won­ders why hu­mans seem to do so badly in ex­per­i­ments on statis­ti­cal rea­son­ing.

In­deed. Should we re­ally ex­pect that bum­ble­bees, birds, rats, and ants are bet­ter in­tu­itive statis­ti­ci­ans than us? It’s cer­tainly pos­si­ble, but it doesn’t ap­pear all that likely, a pri­ori.

The­o­ries of the brain from cog­ni­tive sci­ence provide an­other rea­son why we would be adept at rea­son­ing un­der un­cer­tainty and a rea­son why would be nat­u­ral fre­quen­tists. The con­nec­tion­ist ap­proach to the study of the hu­man mind sug­gests that the brain en­codes in­for­ma­tion by mak­ing literal phys­i­cal con­nec­tions be­tween neu­rons, rep­re­sented on the men­tal level by con­nec­tions be­tween con­cepts. So, for ex­am­ple, if you see a dog and no­tice that it’s black, a con­nec­tion be­tween the con­cept “dog” and the con­cept “black” is made in a very literal sense. If con­nec­tion­ism is ba­si­cally cor­rect, then prob­a­bil­is­tic rea­son­ing shouldn’t be all that difficult for us. For ex­am­ple, if the brain needs to calcu­late the prob­a­bil­ity that any given dog is black, it can just count the num­ber of con­nec­tions be­tween “dog” and “black” and the num­ber of con­nec­tions be­tween “dog” and col­ors other than black.4 Voila! Rel­a­tive fre­quen­cies. As No­bel Prize win­ning economist Ver­non Smith puts it (2008, pg 208):

Hayek’s the­ory5 - that men­tal cat­e­gories are based on the ex­pe­ri­en­tial rel­a­tive fre­quency of co­in­ci­dence be­tween cur­rent and past per­cep­tions—seems to im­ply that our minds should be good at prob­a­bil­ity rea­son­ing.

It also sug­gests that we would be nat­u­ral fre­quen­tists since our brains are quite liter­ally built on rel­a­tive fre­quen­cies.

So both ev­i­dence and the­ory point in the same di­rec­tion. The re­search of Tver­sky and Kah­ne­man, among oth­ers, origi­nally showed that hu­mans were fairly bad at rea­son­ing un­der un­cer­tainty. It turns out much of this is an ar­ti­fact of how their sub­jects were asked to think about un­cer­tainty. Hav­ing sub­jects think in terms of fre­quen­cies ba­si­cally elimi­nates bi­ases in ex­per­i­ments, sug­gest­ing that hu­mans are just nat­u­ral fre­quen­tists—their minds are struc­tured to han­dle prob­a­bil­ities in terms of fre­quen­cies rather than in pro­por­tions or per­centages. Only when we are work­ing with in­for­ma­tion rep­re­sented in a form difficult for our na­tive ar­chi­tec­ture to han­dle do we ap­pear to be us­ing heuris­tics. The­o­ret­i­cal con­sid­er­a­tions from both evolu­tion­ary biol­ogy and cog­ni­tive sci­ence but­tress both claims—that hu­mans are both nat­u­ral fre­quen­tists and not so bad at han­dling un­cer­tainty—at least when think­ing in terms of fre­quen­cies.


Footnotes

1: To any of you who raised an eye­brow, I did it on pur­pose ;).

2: Just to be clear, I am not ar­gu­ing that since we are nat­u­ral fre­quen­tists, the fre­quen­tist ap­proach to prob­a­bil­ity is the cor­rect ap­proach.

3: What seems to be the key pa­per is the sec­ond link in the Google search I linked to. I haven’t read it yet, so I won’t re­ally get into his the­ory here.

4: I ac­knowl­edge that this is a very sim­plified ex­am­ple and a gross sim­plifi­ca­tion of the the­ory.

5: Friedrich Hayek, an­other No­bel Prize win­ning economist, in­de­pen­dently de­vel­oped the con­nec­tion­ist paradigm of the mind cul­mi­nat­ing in his 1952 book The Sen­sory Order. I do recom­mend read­ing Hayek’s book, but not with­out a read­ing group of some sort. It’s short but dense and very difficult to parse—let’s just say Hayek is not known for his prose.

References

Gigeren­zer, Gerd. 1993. “The Bounded Ra­tion­al­ity of Prob­a­bil­is­tic Men­tal Models.” in Mank­telow, K. I., & Over, D. E. eds. Ra­tion­al­ity: Psy­cholog­i­cal and philo­soph­i­cal per­spec­tives. (pp. 284-313). Lon­don: Rout­ledge. Preprint available on­line.

Smith, Ver­non L. 2008. Ra­tion­al­ity in Eco­nomics. Cam­bridge: Cam­bridge UP.

Tver­sky, A., and D. Kah­ne­man. 1983. “Ex­ten­sional ver­sus In­tu­itive Rea­son­ing: The Con­junc­tion Fal­lacy in Prob­a­bil­ity Judg­ment.” Psy­cholog­i­cal Bul­letin 90(4):293-315. Available on­line.