Suppose, for simplicity, that there is either a real correlation of about this size, or no correlation at all. (This is of course a simplification, since there could also be correlations of smaller or greater size.)
So, if we assign a 5% prior to caffeine in fact helping with the Anne question, and if we assume that the chance of seeing a chance at least this large if there is a real correlation is 50% (I’m not sure whether this is reasonable? but I’m hoping the observed correlation would be approximately centered around the actual correlation) our posterior should be:
.05.5 / (.05 .5 + .95 *.004) = 87%.
If we assign a 1% prior to caffeine in fact helping with the Anne question, the posterior (under the same simplified assumptions) should be: 56%.
If we assign a 10% prior (under the same simplified assumptions), the posterior should be: 93%.
I haven’t thought much about whether this is the correct way to formalize/simplify the question, so these numbers may be misleading.
You’re updating on the fact that we observed at least some value, when what we know is we observed exactly that value. I think this overstates the evidence against the null, because all those higher results would have been stronger evidence against the null than the actual result is, and you haven’t told the formalism that those higher results in fact didn’t happen. (ETA: I recommend the Goodman paper linked toward the end of the great-great-grandparent comment, if you haven’t seen it already. It sets upper bounds on the amount of Bayesian evidence you can get from any given p-value. The image with the table has some concrete numbers.)
If the results were by chance, that’s one reason why they probably wouldn’t replicate in Berkeley students, but there are other ways. For example, there could be some common factor like age in the LW population that caused both coffee use and correct answers to the Anne question, but that wasn’t important in the Berkeley population. Or they could just fail to replicate by coincidence, with a probability depending on what counts as “fail to replicate”. So even if I were, say, 95% convinced that this result wasn’t a coincidence, I think I still wouldn’t be more than say 70% sure that it would replicate.
Suppose, for simplicity, that there is either a real correlation of about this size, or no correlation at all. (This is of course a simplification, since there could also be correlations of smaller or greater size.)
So, if we assign a 5% prior to caffeine in fact helping with the Anne question, and if we assume that the chance of seeing a chance at least this large if there is a real correlation is 50% (I’m not sure whether this is reasonable? but I’m hoping the observed correlation would be approximately centered around the actual correlation) our posterior should be:
.05.5 / (.05 .5 + .95 *.004) = 87%.
If we assign a 1% prior to caffeine in fact helping with the Anne question, the posterior (under the same simplified assumptions) should be: 56%.
If we assign a 10% prior (under the same simplified assumptions), the posterior should be: 93%.
I haven’t thought much about whether this is the correct way to formalize/simplify the question, so these numbers may be misleading.
You’re updating on the fact that we observed at least some value, when what we know is we observed exactly that value. I think this overstates the evidence against the null, because all those higher results would have been stronger evidence against the null than the actual result is, and you haven’t told the formalism that those higher results in fact didn’t happen. (ETA: I recommend the Goodman paper linked toward the end of the great-great-grandparent comment, if you haven’t seen it already. It sets upper bounds on the amount of Bayesian evidence you can get from any given p-value. The image with the table has some concrete numbers.)
If the results were by chance, that’s one reason why they probably wouldn’t replicate in Berkeley students, but there are other ways. For example, there could be some common factor like age in the LW population that caused both coffee use and correct answers to the Anne question, but that wasn’t important in the Berkeley population. Or they could just fail to replicate by coincidence, with a probability depending on what counts as “fail to replicate”. So even if I were, say, 95% convinced that this result wasn’t a coincidence, I think I still wouldn’t be more than say 70% sure that it would replicate.