Unknown Knowns

Pre­vi­ously (Marginal Revolu­tion): Gam­bling Can Save Science

A study was done to at­tempt to repli­cate 21 stud­ies pub­lished in Science and Na­ture.

Be­fore­hand, pre­dic­tion mar­kets were used to see which stud­ies would be pre­dicted to repli­cate with what prob­a­bil­ity. The re­sults were as fol­lows (from the origi­nal pa­per):

Fig. 4: Pre­dic­tion mar­ket and sur­vey be­liefs.

The pre­dic­tion mar­ket be­liefs and the sur­vey be­liefs of repli­cat­ing (from treat­ment 2 for mea­sur­ing be­liefs; see the Sup­ple­men­tary Meth­ods for de­tails and Sup­ple­men­tary Fig. 6 for the re­sults from treat­ment 1) are shown. The repli­ca­tion stud­ies are ranked in terms of pre­dic­tion mar­ket be­liefs on the y axis, with repli­ca­tion stud­ies more likely to repli­cate than not to the right of the dashed line. The mean pre­dic­tion mar­ket be­lief of repli­ca­tion is 63.4% (range: 23.1–95.5%, 95% CI = 53.7–73.0%) and the mean sur­vey be­lief is 60.6% (range: 27.8–81.5%, 95% CI = 53.0–68.2%). This is similar to the ac­tual repli­ca­tion rate of 61.9%. The pre­dic­tion mar­ket be­liefs and sur­vey be­liefs are highly cor­re­lated, but im­pre­cisely es­ti­mated (Spear­man cor­re­la­tion co­effi­cient: 0.845, 95% CI = 0.652–0.936, P < 0.001, n = 21). Both the pre­dic­tion mar­ket be­liefs (Spear­man cor­re­la­tion co­effi­cient: 0.842, 95% CI = 0.645–0.934, P < 0.001, n = 21) and the sur­vey be­liefs (Spear­man cor­re­la­tion co­effi­cient: 0.761, 95% CI = 0.491–0.898, P < 0.001, n = 21) are also highly cor­re­lated with a suc­cess­ful repli­ca­tion.

That is not only a su­per im­pres­sive re­sult. That re­sult is sus­pi­ciously amaz­ingly great.

The mean pre­dic­tion mar­ket be­lief of repli­ca­tion is 63.4%, the sur­vey mean was 60.6% and the fi­nal re­sult was 61.9%. That’s im­pres­sive all around.

What’s far more strik­ing is that they knew ex­actly which stud­ies would repli­cate. Every study that would repli­cate traded at a higher prob­a­bil­ity of suc­cess than ev­ery study that would fail to repli­cate.

Com­bin­ing that with an al­most ex­actly cor­rect mean suc­cess rate, we have a stun­ning dis­play of knowl­edge and of un­der-con­fi­dence.

Then we com­bine that with this fact from the pa­per:

Se­cond, among the un­suc­cess­ful repli­ca­tions, there was es­sen­tially no ev­i­dence for the origi­nal find­ing. The av­er­age rel­a­tive effect size was very close to zero for the eight find­ings that failed to repli­cate ac­cord­ing to the statis­ti­cal sig­nifi­cance crite­rion.

That means there was a clean cut. Thir­teen of the stud­ies suc­cess­fully repli­cated. Eight of them not only didn’t repli­cate, but showed very close to no effect.

Now com­bine these facts: The rate of repli­ca­tion was es­ti­mated cor­rectly. The stud­ies were ex­actly cor­rectly sorted by whether they would repli­cate. None of the stud­ies that failed to repli­cate came close to repli­cat­ing, so there was a ‘clean cut’ in the un­der­ly­ing sci­en­tific re­al­ity. Some of the stud­ies found real re­sults. All oth­ers were ei­ther fraud, p-hack­ing or the light p-hack­ing of a bad hy­poth­e­sis and small sam­ple size. No in be­tween.

The im­ple­men­ta­tion of the pre­dic­tion mar­ket used a mar­ket maker who be­gan an­chored to a 50% prob­a­bil­ity of repli­ca­tion. This, and the fact that par­ti­ci­pants had limited to­kens with which to trade (and thus, had to pri­ori­tize which prob­a­bil­ities to move) ex­plains some of the un­der-con­fi­dence in the in­di­vi­d­ual re­sults. The rest seems to be le­gi­t­i­mate un­der-con­fi­dence.

What we have here is an ex­am­ple of that elu­sive ob­ject, the un­known known: Things we don’t know that we know. This com­pletes Rums­feld’s 2×2. We pre­tend that we don’t have enough in­for­ma­tion to know which stud­ies rep­re­sent real re­sults and which ones don’t. We are mod­est. We don’t fully up­date on in­for­ma­tion that doesn’t con­form prop­erly to the for­mal rules of in­fer­ence, or the norms of sci­en­tific de­bate. We don’t dare make the claim that we know, even to our­selves.

And yet, we know.

What else do we know?