So when Robin Hanson wants to know the real effect of health spending on health, he doesn’t look for correlational control-variables studies on the effect of health spending on health, because he knows those studies will return whatever the researchers want it to say. What Robin does instead is look for studies that happen to control for health care spending, on the way to making some other point, and then look at what the correlation coefficient was in those studies, which aren’t putatively about healthcare; and according to Robin the coefficient is usually zero.
This is an example of clever data, obtained despite of the researchers, which I might be inclined to trust—perhaps too much so, for its cleverness. But the scarier moral is that correlational studies are bad enough, and by the time you add in control variables, the researchers usually get whatever result they want. If you trust a result at all in a correlational study, it should be because you think the researchers weren’t thinking about that result at all and were unlikely to ‘optimize’ it by accident while they were optimizing the study outcome they were interested in.
But the scarier moral is that correlational studies are bad enough, and by the time you add in control variables, the researchers usually get whatever result they want.
Hmm.
Here’s my thinking about this in the context of the post.
If the presence of trait A precedes the presence of trait B, and there’s correlation between trait A and trait B, then this establishes a prior that trait A causes trait B. The strength of the prior depends (in some sense) on the number of traits correlated with trait A that precede the presence of trait B, and one updates from the prior based on the plausibility of causal pathways in each case.
In the case of college attended and earnings, we have two hypotheses (that constitute the bulk of the probabilistic effect sizes) as to the source of the correlation: (i) going to a more selective college increases earnings, and (ii) traits that get people into more selective colleges increase earnings.
To test for (ii), one controls for features that feed into college admissions. GPA and SAT scores are the most easy of these to obtain data on, but there are others, such as class rank, extracurricular activities, essays, whether one is a strong athlete, whether one’s parents are major donors to the college, etc. To pick up on some of these, the authors control for the average SAT score of the colleges that the student applied to, and number of applications submitted which measure of the student’s confidence that he or she can get into selective colleges (the intuition being that if a student submits only a small number of applications and applies only to top colleges, he or she has confidence that he or she will get into one).
The question is then whether there are sufficiently many other metrics (with large publicly available data sets) of the characteristics that get students into college so that the authors could have cherry picked ones that move the correlation to be statistically indistinguishable from 0. Can you name five?
If the presence of trait A precedes the presence of trait B
You mean preceeds in time? What if A is my paternal grandfather’s eye color (black), and B is my eye color (black)? Our eye color is correlated due to common ancestry, and A preceeds B in time. But A does not cause B. There are lots of correlated things in the world due to a common cause, and generally one of them preceeds another in time.
You can’t talk about correlation and time like that. I think the only thing we can say is probably macroscopic retrocausation should be disallowed.
The way interventionists think about effects is that the effect of A on B in a person C is really about how B would change in a hypothetical person C’ who differs from C only in that we changed their A. It’s not about correlation, dependence, temporal order, or anything like that.
This approach might work sometimes, but I think it is problematic in most cases for the following reason:
Health care spending can only affect health through medical interventions (unless it is possible to extend someone’s life by signalling that you care enough to spend money on health care).
If the study is designed to estimate the effect of some medical intervention, that intervention will be in the regression model. If you want to interpret the coefficient for health care spending causally, you have a major problem in that the primary causal pathway has been blocked by conditioning on whether the patient got the intervention. In such situations, the coefficient of health care spending would be expected to be zero even if it has a causal effect through the intervention.
So when Robin Hanson wants to know the real effect of health spending on health, he doesn’t look for correlational control-variables studies on the effect of health spending on health, because he knows those studies will return whatever the researchers want it to say. What Robin does instead is look for studies that happen to control for health care spending, on the way to making some other point, and then look at what the correlation coefficient was in those studies, which aren’t putatively about healthcare; and according to Robin the coefficient is usually zero.
This is an example of clever data, obtained despite of the researchers, which I might be inclined to trust—perhaps too much so, for its cleverness. But the scarier moral is that correlational studies are bad enough, and by the time you add in control variables, the researchers usually get whatever result they want. If you trust a result at all in a correlational study, it should be because you think the researchers weren’t thinking about that result at all and were unlikely to ‘optimize’ it by accident while they were optimizing the study outcome they were interested in.
Hmm.
Here’s my thinking about this in the context of the post.
If the presence of trait A precedes the presence of trait B, and there’s correlation between trait A and trait B, then this establishes a prior that trait A causes trait B. The strength of the prior depends (in some sense) on the number of traits correlated with trait A that precede the presence of trait B, and one updates from the prior based on the plausibility of causal pathways in each case.
In the case of college attended and earnings, we have two hypotheses (that constitute the bulk of the probabilistic effect sizes) as to the source of the correlation: (i) going to a more selective college increases earnings, and (ii) traits that get people into more selective colleges increase earnings.
To test for (ii), one controls for features that feed into college admissions. GPA and SAT scores are the most easy of these to obtain data on, but there are others, such as class rank, extracurricular activities, essays, whether one is a strong athlete, whether one’s parents are major donors to the college, etc. To pick up on some of these, the authors control for the average SAT score of the colleges that the student applied to, and number of applications submitted which measure of the student’s confidence that he or she can get into selective colleges (the intuition being that if a student submits only a small number of applications and applies only to top colleges, he or she has confidence that he or she will get into one).
The question is then whether there are sufficiently many other metrics (with large publicly available data sets) of the characteristics that get students into college so that the authors could have cherry picked ones that move the correlation to be statistically indistinguishable from 0. Can you name five?
You mean preceeds in time? What if A is my paternal grandfather’s eye color (black), and B is my eye color (black)? Our eye color is correlated due to common ancestry, and A preceeds B in time. But A does not cause B. There are lots of correlated things in the world due to a common cause, and generally one of them preceeds another in time.
You can’t talk about correlation and time like that. I think the only thing we can say is probably macroscopic retrocausation should be disallowed.
The way interventionists think about effects is that the effect of A on B in a person C is really about how B would change in a hypothetical person C’ who differs from C only in that we changed their A. It’s not about correlation, dependence, temporal order, or anything like that.
This approach might work sometimes, but I think it is problematic in most cases for the following reason:
Health care spending can only affect health through medical interventions (unless it is possible to extend someone’s life by signalling that you care enough to spend money on health care).
If the study is designed to estimate the effect of some medical intervention, that intervention will be in the regression model. If you want to interpret the coefficient for health care spending causally, you have a major problem in that the primary causal pathway has been blocked by conditioning on whether the patient got the intervention. In such situations, the coefficient of health care spending would be expected to be zero even if it has a causal effect through the intervention.