You’re using correlation in what I would consider a weird way. Randomization is intended to control for selection effects to reduce confounds, but when somebody says correlational study I get in my head that they mean an observational study in which no attempt was made to determine predictive causation. When an effect shows up in a nonrandomized study, it’s not that you can’t determine whether the effect was causative; it’s that it’s more difficult to determine whether the causation was due to the independent variable or an extraneous variable unrelated to the independent variable. It’s not a question of whether the effect is due to correlation or causation, but whether the relationship between the independent and dependent variable even exists at all.
(1) Observational studies are almost always attempts to determine causation. Sometimes the investigators try to pretend that they aren’t, but they aren’t fooling anyone, least of all the general public. I know they are attempting to determine causation because nobody would be interested in the results of the study unless they were interested in causation. Moreover, I know they are attempting to determine causation because they do things like “control for confounding”. This procedure is undefined unless the goal is to estimate a causal effect
(2) What do you mean by the sentence “the study was causative”? Of course nobody is suggesting that the study itself had an effect on the dependent variable?
(3) Assuming that the statistics were done correctly and that the investigators have accounted for sampling variability, the relationship between the independent and dependent variable definitely exists. The correlation is real, even if it is due to confounding. It just doesn’t represent a causal effect
You are assuming a couple of things which are almost always true in your (medical) field, but are not necessarily true in general. For example,
Observational studies are almost always attempts to determine causation
Nope. Another very common reason is to create a predictive model without caring about actual causation. If you can’t do interventions but would like to forecast the future, that’s all you need.
Assuming that the statistics were done correctly and that the investigators have accounted for sampling variability, the relationship between the independent and dependent variable definitely exists.
That further assumes your underlying process is stable and is not subject to drift, regime changes, etc. Sometimes you can make that assumption, sometimes you cannot.
Another very common reason is to create a predictive model without caring about actual causation. If you can’t do interventions but would like to forecast the future, that’s all you need.
You’d also like a guarantee that others can’t do interventions, or else your measure could be gamed. (But if there’s an actual causal relationship, then ‘gaming’ isn’t really possible.)
(1) I just think calling a nonrandomized study a correlational study is weird.
(2) I meant to say effect; not study; fixed
(3) If something is caused by a confounding variable, then the independent variable may have no relationship with the dependent variable. You seem to be using correlation to mean the result of an analysis, but I’m thinking of it as the actual real relationship which is distinct from causation. So y=x does not mean y causes x or that x causes y.
I don’t understand what you mean by “real relationship”. I suggest tabooing the terms “real relationship” and “no relationship”.
I am using the word “correlation” to discuss whether the observed variable X predicts the observed variable Y in the (hypothetical?) superpopulation from which the sample was drawn. Such a correlation can exist even if neither variable causes the other.
If X predicts Y in the superpopulation (regardless of causality), the correlation will indeed be real. The only possible definition I can think of for a “false” correlation is one that does not exist in the superpopulation, but which appears in your sample due to sampling variability. Statistical methodology is in general more than adequate to discuss whether the appearance of correlation in your sample is due to real correlation in the superpopulation. You do not need causal inference to reason about this question. Moreover, confounding is not relevant.
Confounding and causal inference are only relevant if you want to know whether the correlation in the superpopulation is due to the causal effect of X on Y. You can certainly define the causal effect as the “actual real relationship”, but then I don’t understand how it is distinct from causation.
The only possible definition I can think of for a “false” correlation is one that does not exist in the superpopulation, but which appears in your sample due to sampling variability.
Right. Which is the problem randomization attempts to correct for, which I think of as a separate problem from causation.
Intersample variability is a type of confound. Increasing sample size is another method for reducing confounding due to intersample variability. Maybe you meant intrasample variability, but that doesn’t make much sense to me in context. Maybe you think of intersample variability as sampling error? Or maybe you have a weird definition of confounding?
Either way, confounding is a separate problem from causation. You can isolate the confounding variables from the independent variable to determine the correlation between x and y without determining a causal relationship. You can also determine the presence of a causal relationship without isolating the independent variable from possible confounding variables.
The nonrandomized studies are determining causality; they’re just doing a worse job at isolating the independent variable, which is what gwern appears to be talking about here.
Or maybe you have a weird definition of confounding?
I use the standard definition of confounding based on whether E(Y| X=x) = E(Y| Do(X=x)), and think about it in terms of whether there exists a backdoor path between X and Y.
Either way, confounding is a separate problem from causation.
The concept of confounding is defined relative to the causal query of interest. If you don’t believe me, try to come up with a coherent definition of confounding that does not depend on the causal question.
You can isolate the confounding variables from the independent variable to determine the correlation between x and y without determining a causal relationship.
With standard statistical techniques you will be able to determine the correlation between X and Y. You will also be able to determine the correlation between X and Y conditional on Z. These are both valid questions and they are both are true correlations. Whether either of those correlations is interesting depends on your causal question and on whether Z is a confounder for that particular query.
You can also determine the presence of a causal relationship without isolating the independent variable from possible confounding variables.
No you can’t. (Unless you have an instrumental variable, in which case you have to make the assumption that the instrument is unconfounded instead of the treatment of interest)
(re: last sentence, also have to assume no direct effect of instrument, but I am sure you knew that, just emphasizing the confounding assumption since discussion is about confounding).
Grand parent’s attitude is precisely what is wrong with LW culture’s complete and utter lack of epistemic/social humility (which I think they inherited from Yudkowsky and his planet-sized ego). Him telling you of all people that you are using a weird definition of confounding is incredibly amusing.
You’re using correlation in what I would consider a weird way. Randomization is intended to control for selection effects to reduce confounds, but when somebody says correlational study I get in my head that they mean an observational study in which no attempt was made to determine predictive causation. When an effect shows up in a nonrandomized study, it’s not that you can’t determine whether the effect was causative; it’s that it’s more difficult to determine whether the causation was due to the independent variable or an extraneous variable unrelated to the independent variable. It’s not a question of whether the effect is due to correlation or causation, but whether the relationship between the independent and dependent variable even exists at all.
(1) Observational studies are almost always attempts to determine causation. Sometimes the investigators try to pretend that they aren’t, but they aren’t fooling anyone, least of all the general public. I know they are attempting to determine causation because nobody would be interested in the results of the study unless they were interested in causation. Moreover, I know they are attempting to determine causation because they do things like “control for confounding”. This procedure is undefined unless the goal is to estimate a causal effect
(2) What do you mean by the sentence “the study was causative”? Of course nobody is suggesting that the study itself had an effect on the dependent variable?
(3) Assuming that the statistics were done correctly and that the investigators have accounted for sampling variability, the relationship between the independent and dependent variable definitely exists. The correlation is real, even if it is due to confounding. It just doesn’t represent a causal effect
You are assuming a couple of things which are almost always true in your (medical) field, but are not necessarily true in general. For example,
Nope. Another very common reason is to create a predictive model without caring about actual causation. If you can’t do interventions but would like to forecast the future, that’s all you need.
That further assumes your underlying process is stable and is not subject to drift, regime changes, etc. Sometimes you can make that assumption, sometimes you cannot.
You’d also like a guarantee that others can’t do interventions, or else your measure could be gamed. (But if there’s an actual causal relationship, then ‘gaming’ isn’t really possible.)
(1) I just think calling a nonrandomized study a correlational study is weird.
(2) I meant to say effect; not study; fixed
(3) If something is caused by a confounding variable, then the independent variable may have no relationship with the dependent variable. You seem to be using correlation to mean the result of an analysis, but I’m thinking of it as the actual real relationship which is distinct from causation. So y=x does not mean y causes x or that x causes y.
I don’t understand what you mean by “real relationship”. I suggest tabooing the terms “real relationship” and “no relationship”.
I am using the word “correlation” to discuss whether the observed variable X predicts the observed variable Y in the (hypothetical?) superpopulation from which the sample was drawn. Such a correlation can exist even if neither variable causes the other.
If X predicts Y in the superpopulation (regardless of causality), the correlation will indeed be real. The only possible definition I can think of for a “false” correlation is one that does not exist in the superpopulation, but which appears in your sample due to sampling variability. Statistical methodology is in general more than adequate to discuss whether the appearance of correlation in your sample is due to real correlation in the superpopulation. You do not need causal inference to reason about this question. Moreover, confounding is not relevant.
Confounding and causal inference are only relevant if you want to know whether the correlation in the superpopulation is due to the causal effect of X on Y. You can certainly define the causal effect as the “actual real relationship”, but then I don’t understand how it is distinct from causation.
Right. Which is the problem randomization attempts to correct for, which I think of as a separate problem from causation.
No. Randomization abolishes confounding, not sampling variability
If your problem is sampling variability, the answer is to increase the power.
If your problem is confounding, the ideal answer is randomization and the second best answer is modern causality theory.
Statisticians study the first problem, causal inference people study the second problem
Intersample variability is a type of confound. Increasing sample size is another method for reducing confounding due to intersample variability. Maybe you meant intrasample variability, but that doesn’t make much sense to me in context. Maybe you think of intersample variability as sampling error? Or maybe you have a weird definition of confounding?
Either way, confounding is a separate problem from causation. You can isolate the confounding variables from the independent variable to determine the correlation between x and y without determining a causal relationship. You can also determine the presence of a causal relationship without isolating the independent variable from possible confounding variables.
The nonrandomized studies are determining causality; they’re just doing a worse job at isolating the independent variable, which is what gwern appears to be talking about here.
No it isn’t
I use the standard definition of confounding based on whether E(Y| X=x) = E(Y| Do(X=x)), and think about it in terms of whether there exists a backdoor path between X and Y.
The concept of confounding is defined relative to the causal query of interest. If you don’t believe me, try to come up with a coherent definition of confounding that does not depend on the causal question.
With standard statistical techniques you will be able to determine the correlation between X and Y. You will also be able to determine the correlation between X and Y conditional on Z. These are both valid questions and they are both are true correlations. Whether either of those correlations is interesting depends on your causal question and on whether Z is a confounder for that particular query.
No you can’t. (Unless you have an instrumental variable, in which case you have to make the assumption that the instrument is unconfounded instead of the treatment of interest)
Anders_H, you are much more patient than I am!
(re: last sentence, also have to assume no direct effect of instrument, but I am sure you knew that, just emphasizing the confounding assumption since discussion is about confounding).
Grand parent’s attitude is precisely what is wrong with LW culture’s complete and utter lack of epistemic/social humility (which I think they inherited from Yudkowsky and his planet-sized ego). Him telling you of all people that you are using a weird definition of confounding is incredibly amusing.
I just realized the randomized-nonrandomized study was just an example and not what you were talking about.