This test might have been more useful if the way those 21 papers were chosen was specified. If they weren’t sampled randomly, it might be the case that, for example, the fraction of bogus papers in those 21 papers is a lot higher than the general fraction in psychology papers published in Nature and Science. In such a case, the evaluation will be biased in favor of predictors who tend to classify non-bogus papers as bogus.
We will replicate 21 experimental studies in the social sciences published in Nature and Science in 2010-2015. These papers were objectively chosen because they were published in these two high-profile journals in this time period, they share a common structure in testing a treatment effect within or between subjects, they test at least one clear hypothesis with a statistically significant finding, and they were performed using students or accessible convenience samples. We plan to conduct the replications between September 2016 and September 2017.
This test might have been more useful if the way those 21 papers were chosen was specified. If they weren’t sampled randomly, it might be the case that, for example, the fraction of bogus papers in those 21 papers is a lot higher than the general fraction in psychology papers published in Nature and Science. In such a case, the evaluation will be biased in favor of predictors who tend to classify non-bogus papers as bogus.
From the replication project’s web page: