There’s nothing inherently wrong with data dredging. Considering all possible hypotheses and keeping the ones suggested by the data is just Solomonoff induction. It only becomes problematic if you don’t have a consistent prior, e.g. if you keep the hypothesis with the greatest likelihood ratio rather than the greatest posterior.
Hypothesis-driven has its place in the human practice of science, because humans have a hard time computing a prior after having seen the data. But that’s a problem with the humans, not with the math.
It only becomes problematic if you don’t have a consistent prior, i.e. if you keep the hypothesis with the greatest likelihood ratio rather than the greatest posterior.
If that were true, you would never need to hold out a validation set.
There’s nothing inherently wrong with data dredging. Considering all possible hypotheses and keeping the ones suggested by the data is just Solomonoff induction. It only becomes problematic if you don’t have a consistent prior, e.g. if you keep the hypothesis with the greatest likelihood ratio rather than the greatest posterior.
Hypothesis-driven has its place in the human practice of science, because humans have a hard time computing a prior after having seen the data. But that’s a problem with the humans, not with the math.
...or if you believe everything that has p<.05.
If that were true, you would never need to hold out a validation set.