Not being stupid is an admirable goal, but it’s not well-defined. I tried Googling “spaghetti factory analysis” and “spaghetti factory analysis statistics” for more information, but it’s not turning up anything. Is there a standard term for the error you are referring to?
Can’t I have my common sense, but make all possible comparisons anyway just to inform my common sense as to the general directions in which the winds of evidence are blowing?
I don’t see how informing myself of correlations harms my common sense in any way, and the only alternative I can think of is to stick to my prejudices, but whenever some doubt arises as to which of my prejudices has a stronger claim, I should thoroughly investigate real world data to settle the dispute between the two. As soon as that process is over, I should stop immediately because nothing else matters.
Not being stupid is an admirable goal, but it’s not well-defined.
It’s not a goal. It is a criterion you should apply to the steps which you intend to take. I admit to it not being well-defined :-)
Is there a standard term for the error you are referring to?
In statistics that used to be called “data mining” and was a bad thing. Data science repurposed the term and it’s now a good thing :-/ Andrew Gelman calls a similar phenomenon “garden of the forking paths” (see e.g. here).
Basically the problem is paying attention to noise.
Can’t I have my common sense, but make all possible comparisons anyway
You can. It’s just that you shouldn’t attach undue importance to which comparison came the first and which the second. You’re generating estimates and at the very minimum you should also be generating what you think are the errors of your estimates—these should be helpful in establishing how meaningful your ranking of all the pairs is.
And you still need to define a goal. For example, a goal of explanation/understanding is different from the goal of forecasting.
I’m not telling you to ignore the data. I’m telling you to be sceptical of what the data is telling you.
Thank you! Those data mining algorithms are exactly what I was looking for.
(Personally, I would describe the situation you are warning me against as reducing it “more than is possible” rather than “as much as possible”. I am definitely in favor of using common sense.)
Not being stupid is an admirable goal, but it’s not well-defined. I tried Googling “spaghetti factory analysis” and “spaghetti factory analysis statistics” for more information, but it’s not turning up anything. Is there a standard term for the error you are referring to?
Can’t I have my common sense, but make all possible comparisons anyway just to inform my common sense as to the general directions in which the winds of evidence are blowing?
I don’t see how informing myself of correlations harms my common sense in any way, and the only alternative I can think of is to stick to my prejudices, but whenever some doubt arises as to which of my prejudices has a stronger claim, I should thoroughly investigate real world data to settle the dispute between the two. As soon as that process is over, I should stop immediately because nothing else matters.
Is that the course of action you recommend?
It’s not a goal. It is a criterion you should apply to the steps which you intend to take. I admit to it not being well-defined :-)
In statistics that used to be called “data mining” and was a bad thing. Data science repurposed the term and it’s now a good thing :-/ Andrew Gelman calls a similar phenomenon “garden of the forking paths” (see e.g. here).
Basically the problem is paying attention to noise.
You can. It’s just that you shouldn’t attach undue importance to which comparison came the first and which the second. You’re generating estimates and at the very minimum you should also be generating what you think are the errors of your estimates—these should be helpful in establishing how meaningful your ranking of all the pairs is.
And you still need to define a goal. For example, a goal of explanation/understanding is different from the goal of forecasting.
I’m not telling you to ignore the data. I’m telling you to be sceptical of what the data is telling you.
Thank you! Those data mining algorithms are exactly what I was looking for.
(Personally, I would describe the situation you are warning me against as reducing it “more than is possible” rather than “as much as possible”. I am definitely in favor of using common sense.)