Even the crème de la crème of economics journals barely manage a ⅔ expected replication rate.
Is a two-thirds replication rate necessarily bad? This is an honest question, since I don’t know what the optimal replication rate would be. Seems worth noting that a) a 100% replication rate seems too high, since it would indicate that people were only doing boring experiments that were certain to replicate b) “replication rate” seems to mean “does the first replication attempt succeed”, and some fraction of replication attempts will fail due to random chance even if the effect is genuine.
I don’t think a high replication rate necessarily implies the experiments were boring. Suppose you do 10 experiments, but they’re all speculative and unlikely to be true: let’s say only one of them is looking at a true effect, BUT your sample sizes are enormous and you have a low significance cutoff. So you detect the one effect and get 9 nulls on the others. When people try to replicate them, they have a 100% success rate on both the positive and the negative results.
The fraction of attempts that will fail due to random chance depends on the power, and replicators tend to go for very high levels of power, so typically you’d have about 5% false negatives or so in the replications.
Is a two-thirds replication rate necessarily bad? This is an honest question, since I don’t know what the optimal replication rate would be. Seems worth noting that a) a 100% replication rate seems too high, since it would indicate that people were only doing boring experiments that were certain to replicate b) “replication rate” seems to mean “does the first replication attempt succeed”, and some fraction of replication attempts will fail due to random chance even if the effect is genuine.
I think there’s an idea that a paper with a p=0.05 finding should replicate 95% of the time. If it doesn’t then the p-value was wrong.
That’s not really what a p-value means though, right? The actual replication rate should depend on the prior and the power of the studies.
I don’t think a high replication rate necessarily implies the experiments were boring. Suppose you do 10 experiments, but they’re all speculative and unlikely to be true: let’s say only one of them is looking at a true effect, BUT your sample sizes are enormous and you have a low significance cutoff. So you detect the one effect and get 9 nulls on the others. When people try to replicate them, they have a 100% success rate on both the positive and the negative results.
The fraction of attempts that will fail due to random chance depends on the power, and replicators tend to go for very high levels of power, so typically you’d have about 5% false negatives or so in the replications.