abstractapplic comments on P-hacking as focusing a microscope

abstractapplic 28 Nov 2025 0:37 UTC
2 points
0
My preferred approach (roughly analogous to train/test splits in ML): random-split your dataset in two right at the start, perform EDA etc on the first half, use the second half only for pre-decided statistical tests at the very end of your investigation.
(This implicitly assumes that your dataset is something you were handed, instead of something you had to find and/or make; being able to choose when to stop looking for new rows and/or columns introduces an entire host of subtler problems.)