Not being stupid is an admirable goal, but it’s not well-defined.
It’s not a goal. It is a criterion you should apply to the steps which you intend to take. I admit to it not being well-defined :-)
Is there a standard term for the error you are referring to?
In statistics that used to be called “data mining” and was a bad thing. Data science repurposed the term and it’s now a good thing :-/ Andrew Gelman calls a similar phenomenon “garden of the forking paths” (see e.g. here).
Basically the problem is paying attention to noise.
Can’t I have my common sense, but make all possible comparisons anyway
You can. It’s just that you shouldn’t attach undue importance to which comparison came the first and which the second. You’re generating estimates and at the very minimum you should also be generating what you think are the errors of your estimates—these should be helpful in establishing how meaningful your ranking of all the pairs is.
And you still need to define a goal. For example, a goal of explanation/understanding is different from the goal of forecasting.
I’m not telling you to ignore the data. I’m telling you to be sceptical of what the data is telling you.
Thank you! Those data mining algorithms are exactly what I was looking for.
(Personally, I would describe the situation you are warning me against as reducing it “more than is possible” rather than “as much as possible”. I am definitely in favor of using common sense.)
It’s not a goal. It is a criterion you should apply to the steps which you intend to take. I admit to it not being well-defined :-)
In statistics that used to be called “data mining” and was a bad thing. Data science repurposed the term and it’s now a good thing :-/ Andrew Gelman calls a similar phenomenon “garden of the forking paths” (see e.g. here).
Basically the problem is paying attention to noise.
You can. It’s just that you shouldn’t attach undue importance to which comparison came the first and which the second. You’re generating estimates and at the very minimum you should also be generating what you think are the errors of your estimates—these should be helpful in establishing how meaningful your ranking of all the pairs is.
And you still need to define a goal. For example, a goal of explanation/understanding is different from the goal of forecasting.
I’m not telling you to ignore the data. I’m telling you to be sceptical of what the data is telling you.
Thank you! Those data mining algorithms are exactly what I was looking for.
(Personally, I would describe the situation you are warning me against as reducing it “more than is possible” rather than “as much as possible”. I am definitely in favor of using common sense.)