I believe you. Suppose they were just looking for any genotype that might interact, and then there is a chance given all the zillions of genotypes that any genotype will show interaction just due to random chance.
If that’s the case, that the interaction occurred by random chance, then you wouldn’t expect the interaction to occur in a second experiment. But then—is it really necessary to do two experiments? One could just look for interactions in the first half of the observations and then see if they are repeated in the second. Even so, you would still have a probability of some false positives but the probability would be smaller. So finally, I expect that any experimentalist working with such data would be aware of all of this and would look for a signal with a false positive probability of less than .05, so it would all be factored in the statistical analyses.
… so in the paper, is the probability of the interaction being significant calculated before or after considering there is some chance that any random genotype might show p<.05?
As I understand them, proper scientific standards demand that researchers pick a hypothesis and then test it using pre-chosen statistical methods, and ideally whether or not the null hypothesis is rejected at this level of significance they report their experimental results by publishing a paper. Yudkowsky suggests that experimenters should be asked to publish papers before they conduct their experiments, to ensure that this good practice is upheld (and also suggests that Bayesian likelihood ratios should be published instead of frequentist statistics).
The (relatively) benign problem that is often seen is the “file drawer” effect where only papers that reject the null hypothesis (i.e. have interesting results) are published. The more serious allegation is that researchers have data-dredged, i.e. conducted an experiment and then looked for plausible hypotheses that they can claim to have “tested” and found a positive result, or run various statistical analyses until they found one that suited them (it’s very optimistic to claim that this doesn’t happen!) As you say, in this kind of investigation there are bound to be many such spurious results to exploit, particularly in a field where 5% significance is the standard.
I was questioning the likelihood that the researchers really thought they had good cause to conduct an experiment to test the hypothesis that some apparently obscure genotype was causally related to people making utilitarian moral judgements. It doesn’t pass the laugh test for me, but that might just be my ignorance.
I believe you. Suppose they were just looking for any genotype that might interact, and then there is a chance given all the zillions of genotypes that any genotype will show interaction just due to random chance.
If that’s the case, that the interaction occurred by random chance, then you wouldn’t expect the interaction to occur in a second experiment. But then—is it really necessary to do two experiments? One could just look for interactions in the first half of the observations and then see if they are repeated in the second. Even so, you would still have a probability of some false positives but the probability would be smaller. So finally, I expect that any experimentalist working with such data would be aware of all of this and would look for a signal with a false positive probability of less than .05, so it would all be factored in the statistical analyses.
… so in the paper, is the probability of the interaction being significant calculated before or after considering there is some chance that any random genotype might show p<.05?
As I understand them, proper scientific standards demand that researchers pick a hypothesis and then test it using pre-chosen statistical methods, and ideally whether or not the null hypothesis is rejected at this level of significance they report their experimental results by publishing a paper. Yudkowsky suggests that experimenters should be asked to publish papers before they conduct their experiments, to ensure that this good practice is upheld (and also suggests that Bayesian likelihood ratios should be published instead of frequentist statistics).
The (relatively) benign problem that is often seen is the “file drawer” effect where only papers that reject the null hypothesis (i.e. have interesting results) are published. The more serious allegation is that researchers have data-dredged, i.e. conducted an experiment and then looked for plausible hypotheses that they can claim to have “tested” and found a positive result, or run various statistical analyses until they found one that suited them (it’s very optimistic to claim that this doesn’t happen!) As you say, in this kind of investigation there are bound to be many such spurious results to exploit, particularly in a field where 5% significance is the standard.
I was questioning the likelihood that the researchers really thought they had good cause to conduct an experiment to test the hypothesis that some apparently obscure genotype was causally related to people making utilitarian moral judgements. It doesn’t pass the laugh test for me, but that might just be my ignorance.