This is not biased data. No one tampered with it. No one preferentially left out some data. There is no Cartesian daemon tampering with you. It’s a perfectly ordinary causal problem for which one has all the available data. If you run a regression on the data, you will get accurate predictions of future similar data—just not what happens when you intervene and realize the counterfactual. You can’t throw your hands up and disdainfully refuse to solve the problem, proclaiming, ‘oh, that’s biased’. It may be hard, and the best available solution weak or require strong assumptions, but if that is the case, the correct method should say as much and specify what additional data or interventions would allow stronger conclusions.
I’m not certain why I used the word “bias”. I think I was getting at that the data isn’t representative of the population of interest.
Regardless, no other method can solve the problem specified without additional information (which you claimed). And with additional information, it’s straightforward prediction again.
That is, condition on their prior health status, not just the fact they’ve been given the drug. And prior probabilities.
No method can solve the problem you’ve given without additional information.
What do you call “solving the problem”?
Any method will output some estimates. Some methods will output better estimates, some worse. As people have pointed out, this was an example of a real problem and yes, real-life data is usually pretty messy. We need methods which can handle messy data and not work just on spherical cows in vacuum.
Biased data is a real thing and this is a great example. No method can solve the problem you’ve given without additional information.
This is not biased data. No one tampered with it. No one preferentially left out some data. There is no Cartesian daemon tampering with you. It’s a perfectly ordinary causal problem for which one has all the available data. If you run a regression on the data, you will get accurate predictions of future similar data—just not what happens when you intervene and realize the counterfactual. You can’t throw your hands up and disdainfully refuse to solve the problem, proclaiming, ‘oh, that’s biased’. It may be hard, and the best available solution weak or require strong assumptions, but if that is the case, the correct method should say as much and specify what additional data or interventions would allow stronger conclusions.
I’m not certain why I used the word “bias”. I think I was getting at that the data isn’t representative of the population of interest.
Regardless, no other method can solve the problem specified without additional information (which you claimed). And with additional information, it’s straightforward prediction again.
That is, condition on their prior health status, not just the fact they’ve been given the drug. And prior probabilities.
What do you call “solving the problem”?
Any method will output some estimates. Some methods will output better estimates, some worse. As people have pointed out, this was an example of a real problem and yes, real-life data is usually pretty messy. We need methods which can handle messy data and not work just on spherical cows in vacuum.