Simultaneous Overconfidence and Underconfidence

Fol­low-up to this and this on my per­sonal blog. Prep for this meetup. Cross-posted on my blog.

Eliezer talked about cog­ni­tive bias, statis­ti­cal bias, and in­duc­tive bias in a se­ries of posts only the first of which made it di­rectly into the LessWrong se­quences as cur­rently or­ga­nized (un­less I’ve missed them!). In­duc­tive bias helps us leap to the right con­clu­sion from the ev­i­dence, if it cap­tures good prior as­sump­tions. Statis­ti­cal bias can be good or bad, de­pend­ing in part on the bias-var­i­ance trade-off. Cog­ni­tive bias refers only to ob­sta­cles which pre­vent us from think­ing well.

Un­for­tu­nately, as we shall see, psy­chol­o­gists can be quite in­con­sis­tent about how cog­ni­tive bias is defined. This cre­ated a para­dox in the his­tory of cog­ni­tive bias re­search. One well-re­searched and highly ex­per­i­men­tally val­i­dated effect was con­ser­vatism, the ten­dency to give es­ti­mates too mid­dling, or prob­a­bil­ities too near 50%. This re­lates es­pe­cially to in­te­gra­tion of in­for­ma­tion: when given ev­i­dence re­lat­ing to a situ­a­tion, peo­ple tend not to take it fully into ac­count, as if they are stuck with their prior. Another highly-val­i­dated effect was over­con­fi­dence, re­lat­ing es­pe­cially to cal­ibra­tion: when peo­ple give high sub­jec­tive prob­a­bil­ities like 99%, they are typ­i­cally wrong with much higher fre­quency.

In real-life situ­a­tions, these two con­tra­dict: there is no clean dis­tinc­tion be­tween in­for­ma­tion in­te­gra­tion tasks and cal­ibra­tion tasks. A per­son’s sub­jec­tive prob­a­bil­ity is always, in some sense, the in­te­gra­tion of the in­for­ma­tion they’ve been ex­posed to. In prac­tice, then, when should we ex­pect other peo­ple to be un­der- or over- con­fi­dent?

Si­mul­ta­neous Over­con­fi­dence and Underconfidence

The con­flict was re­solved in an ex­cel­lent pa­per by Ido Ereve et al which showed that it’s the re­sult of how psy­chol­o­gists did their statis­tics. Essen­tially, one group of psy­chol­o­gists defined bias one way, and the other defined it an­other way. The re­sults are not re­ally con­tra­dic­tory; they are mea­sur­ing differ­ent things. In fact, you can find un­der­con­fi­dence or over­con­fi­dence in the same data sets by ap­ply­ing the differ­ent statis­ti­cal tech­niques; it has lit­tle or noth­ing to do with the differ­ences be­tween in­for­ma­tion in­te­gra­tion tasks and prob­a­bil­ity cal­ibra­tion tasks. Here’s my rough draw­ing of the phe­nomenon (apolo­gies for my hand-drawn illus­tra­tions):

Over­con­fi­dence here refers to prob­a­bil­ities which are more ex­treme than they should be, here illus­trated as be­ing fur­ther from 50%. (This baseline makes sense when choos­ing from two op­tions, but won’t always be the right baseline to think about.) Un­der­con­fi­dent sub­jec­tive prob­a­bil­ities are as­so­ci­ated with more ex­treme ob­jec­tive prob­a­bil­ities, which is why the slope tilts up in the figure. Over­con­fi­dent similarly tilts down, in­di­cat­ing that the sub­jec­tive prob­a­bil­ities are as­so­ci­ated with less-ex­treme ob­jec­tive prob­a­bil­ities. Un­for­tu­nately, if you don’t know how the lines are com­puted, this means less than you might think. Ido Ereve et al show that these two re­gres­sion lines can be de­rived from just one data-set. I found the pa­per easy and fun to read, but I’ll ex­plain the phe­nomenon in a differ­ent way here by re­lat­ing it to the con­cept of statis­ti­cal bias and tails com­ing apart.

The Tails Come Apart

Every­one who has read Why the Tails Come Apart will likely rec­og­nize this image:

The idea is that even if X and Y are highly cor­re­lated, the most ex­treme X val­ues and the most ex­treme Y val­ues will differ. I’ve la­bel­led the differ­ence the “curse” af­ter the op­ti­mizer’s curse: if you op­ti­mize a crite­ria X which is merely cor­re­lated with the thing Y you ac­tu­ally want, you can ex­pect to be dis­ap­pointed.

Ap­ply­ing the idea to cal­ibra­tion, we can say that the most ex­treme sub­jec­tive be­liefs are al­most cer­tainly not the most ex­treme on the ob­jec­tive scale. That is: a per­son’s most con­fi­dent be­liefs are al­most cer­tainly over­con­fi­dent. A be­lief is not likely to have worked its way up to the high­est peak of con­fi­dence by merit alone. It’s far more likely that some merit but also some er­ror in rea­son­ing com­bined to yield high con­fi­dence. This sounds like the cal­ibra­tion liter­a­ture, which found that peo­ple are gen­er­ally over­con­fi­dant. What about un­der­con­fi­dence? By a sym­met­ric ar­gu­ment, the points with the most ex­treme ob­jec­tive prob­a­bil­ities are not likely to be the same as those with the high­est sub­jec­tive be­lief; er­rors in our think­ing are much more likely to make us un­der­con­fi­dant than over­con­fi­dant in those cases.

This ar­gu­ment tells us about ex­treme points, but not about the over­all dis­tri­bu­tion. So, how does this ex­plain si­mul­ta­neous over­con­fi­dence and un­der­con­fi­dence? To un­der­stand that, we need to un­der­stand the statis­tics which psy­chol­o­gists used. We’ll use av­er­ages rather than max­i­mums, lead­ing to a “soft ver­sion” which shows the tails com­ing apart grad­u­ally, rather than only at ex­treme ends.

Statis­ti­cal Bias

Statis­ti­cal bias is defined through the no­tion of an es­ti­ma­tor. We have some quan­tity we want to know, X, and we use an es­ti­ma­tor to guess what it might be. The es­ti­ma­tor will be some calcu­la­tion which gives us our es­ti­mate, which I will write as X^. An es­ti­ma­tor is de­rived from noisy in­for­ma­tion, such as a sam­ple drawn at ran­dom from a larger pop­u­la­tion. The differ­ence be­tween the es­ti­ma­tor and the true value, X^-X, would ideally be zero; how­ever, this is un­re­al­is­tic. We ex­pect es­ti­ma­tors to have er­ror, but sys­tem­atic er­ror is referred to as bias.

Given a par­tic­u­lar value for X, the bias is defined as the ex­pected value of X^-X, writ­ten EX(X^-X). An un­bi­ased es­ti­ma­tor is an es­ti­ma­tor such that EX(X^-X)=0 for any value of X we choose.

Due to the bias-var­i­ance trade-off, un­bi­ased es­ti­ma­tors are not the best way to min­i­mize er­ror in gen­eral. How­ever, statis­ti­ci­ans still love un­bi­ased es­ti­ma­tors. It’s a nice prop­erty to have, and in situ­a­tions where it works, it has a more ob­jec­tive feel than es­ti­ma­tors which use bias to fur­ther re­duce er­ror.

No­tice, the defi­ni­tion of bias is tak­ing fixed X; that is, it’s fix­ing the quan­tity which we don’t know. Given a fixed X, the un­bi­ased es­ti­ma­tor’s av­er­age value will equal X. This is a pic­ture of bias which can only be eval­u­ated “from the out­side”; that is, from a per­spec­tive in which we can fix the un­known X.

A more in­side-view of statis­ti­cal es­ti­ma­tion is to con­sider a fixed body of ev­i­dence, and make the es­ti­ma­tor equal the av­er­age un­known. This is ex­actly in­verse to un­bi­ased es­ti­ma­tion:

In the image, we want to es­ti­mate un­known Y from ob­served X. The two vari­ables are cor­re­lated, just like in the ear­lier “tails come apart” sce­nario. The av­er­age-Y es­ti­ma­tor tilts down be­cause good es­ti­mates tend to be con­ser­va­tive: be­cause I only have par­tial in­for­ma­tion about Y, I want to take into ac­count what I see from X but also pull to­ward the av­er­age value of Y to be safe. On the other hand, un­bi­ased es­ti­ma­tors tend to be over­con­fi­dent: the effect of X is ex­ag­ger­ated. For a fixed Y, the av­er­age Y^ is sup­posed to equal Y. How­ever, for fixed Y, the X we will get will lean to­ward the mean X (just as for a fixed X, we ob­served that the av­er­age Y leans to­ward the mean Y). There­fore, in or­der for Y^ to be high enough, it needs to pull up sharply: mid­dling val­ues of X need to give more ex­treme Y^ es­ti­mates.

If we su­per­im­pose this on top of the tails-come-apart image, we see that this is some­thing like a gen­er­al­iza­tion:

Wrap­ping It All Up

The punch­line is that these two differ­ent re­gres­sion lines were ex­actly what yields si­mul­ta­neous un­der­con­fi­dence and over­con­fi­dence. The stud­ies in con­ser­vatism were tak­ing the ob­jec­tive prob­a­bil­ity as the in­de­pen­dent vari­able, and graph­ing peo­ple’s sub­jec­tive prob­a­bil­ities as a func­tion of that. The nat­u­ral next step is to take the av­er­age sub­jec­tive prob­a­bil­ity per fixed ob­jec­tive prob­a­bil­ity. This will tend to show un­der­con­fi­dence due to the statis­tics of the situ­a­tion.

The stud­ies on cal­ibra­tion, on the other hand, took the sub­jec­tive prob­a­bil­ities as the in­de­pen­dent vari­able, graph­ing av­er­age cor­rect as a func­tion of that. This will tend to show over­con­fi­dence, even with the same data as shows un­der­con­fi­dence in the other anal­y­sis.

From an in­di­vi­d­ual’s stand­point, the over­con­fi­dence is the real phe­nomenon. Er­rors in judge­ment tend to make us over­con­fi­dent rather than un­der­con­fi­dent be­cause er­rors make the tails come apart so that if you se­lect our most con­fi­dent be­liefs it’s a good bet that they have only mediocre sup­port from ev­i­dence, even if gen­er­ally speak­ing our level of be­lief is highly cor­re­lated with how well-sup­ported a claim is. Due to the way the tails come apart grad­u­ally, we can ex­pect that the higher our con­fi­dence, the larger the gap be­tween that con­fi­dence and the level of fac­tual sup­port for that be­lief.

This is not a fixed fact of hu­man cog­ni­tion pre-or­dained by statis­tics, how­ever. It’s merely what hap­pens due to ran­dom er­ror. Not all stud­ies show sys­tem­atic over­con­fi­dence, and in a given study, not all sub­jects will dis­play over­con­fi­dence. Ran­dom er­rors in judge­ment will tend to cre­ate over­con­fi­dence as a re­sult of the statis­ti­cal phe­nom­ena de­scribed above, but sys­tem­atic cor­rec­tion is still an op­tion.

I’ve also writ­ten a sim­ple simu­la­tion of this. Ju­lia code is here. If you don’t have Ju­lia in­stalled or don’t want to in­stall it, you can run the code on­line at Ju­li­aBox.