Look, HIV patients who get HAART die more often (because people who get HAART are already very sick). We don’t get to see the health status confounder because we don’t get to observe everything we want. Given this, is HAART in fact killing people, or not?
Well, of course I can’t give the right answer if the right answer depends on information you’ve just specified I don’t have.
If something does handle the confounder properly, it’s not EDT anymore (because it’s not going to look at E[death|HAART]).
Again, I think there is a nontrivial selection bias / reference class issue here that needs to be addressed. The appropriate reference class for deciding whether to give HAART to an HIV patient is not just the set of all HIV patients who’ve been given HAART precisely because of the possibility of confounders.
I think discussions of AIXI, source-code aware agents, etc. in the context of decision theories are a bit sterile because they are very far from actual problems people want to solve (e.g. is this actual non-hypothetical drug killing actual non-hypothetical people?)
In actual problems people want to solve, people have the option of acquiring more information and working from there. It’s plausible that with enough information even relatively bad decision theories will still output a reasonable answer (my understanding is that this kind of phenomenon is common in machine learning, for example). But the general question of what to do given a fixed amount of information remains open and is still interesting.
Well, of course I can’t give the right answer if the right answer depends on information you’ve just specified I don’t
have.
I think there is “the right answer” here, and I think it does not rely on observing the confounder. If your decision theory does then (a) your decision theory isn’t as smart as it could be, and (b) you are needlessly restricting yourself to certain types of decision theories.
The appropriate reference class for deciding whether to give HAART to an HIV patient is not just the set of all HIV
patients who’ve been given HAART precisely because of the possibility of confounders.
People have been thinking about confounders for a long time (earliest reference known to me to a “randomized” trial is the book of Daniel, see also this: http://ije.oxfordjournals.org/content/33/2/247.long). There is a lot of nice clever math that gets around unobserved confounders developed in the last 100 years or so. Saying “well we just need to observe confounders” is sort of silly. That’s like saying “well, if you want to solve this tricky computational problem forget about developing new algorithms and that whole computational complexity thing, and just buy more hardware.”
In actual problems people want to solve, people have the option of acquiring more information and working from
there.
I don’t know what kind of actual problems you work on, but the reality of life in stats, medicine, etc. is you have your dataset and you got to draw conclusions from it. The dataset is crappy—there is probably selection bias, all sorts of missing data, censoring, things we would really liked to have known but which were never collected, etc. This is just a fact of life for folks in the trenches in the empirical sciences/data analysis. The right answer here is not denial, but new methodology.
Thanks for your interest! The name of the area is “causal inference.” Keywords: “standardization” (in epidemiology), “confounder or covariate adjustment,” “propensity score”, “instrumental variables”, “back-door criterion,” “front-door criterion,” “g-formula”, “potential outcomes”, “ignorability,” “inverse probability weighting,” “mediation analysis,” “interference”, etc.
Look, HIV patients who get HAART die more often (because people who get HAART are already very sick). We don’t get to see the health status confounder because we don’t get to observe everything we want. Given this, is HAART in fact killing people, or not?
Well, of course I can’t give the right answer if the right answer depends on information you’ve just specified I don’t have.
You’re sort of missing what Ilya is trying to say. You might have to look at the actual details of the example he is referring to in order for this to make sense. The general idea is that even though we can’t observe certain variables, we still have enough evidence to justify the causal model where HAART leads to fewer people die, so we can conclude that we should prescribe it.
I would object to Ilya’s more general point though. Saying that EDT would use E(death|HAART) to determine whether to prescribe HAART is making the same sort of reference class error you discuss in the post. EDT agents use EDT, not the procedures used to A0 and A1 in the example, so we really need to calculate E(death|EDT agent prescribes HAART). I would expect this to produce essentially the same results as a Pearlian E(death | do(HAART)), and would probably regard it as a failure of EDT if it did not add up to the same thing, but I think that there is value in discovering how exactly this works out, if it does.
A challenge (not in a bad sense, I hope): I would be interested in seeing an EDT derivation of the right answer in this example, if anyone wants to do it.
Well, of course I can’t give the right answer if the right answer depends on information you’ve just specified I don’t have.
Again, I think there is a nontrivial selection bias / reference class issue here that needs to be addressed. The appropriate reference class for deciding whether to give HAART to an HIV patient is not just the set of all HIV patients who’ve been given HAART precisely because of the possibility of confounders.
In actual problems people want to solve, people have the option of acquiring more information and working from there. It’s plausible that with enough information even relatively bad decision theories will still output a reasonable answer (my understanding is that this kind of phenomenon is common in machine learning, for example). But the general question of what to do given a fixed amount of information remains open and is still interesting.
I think there is “the right answer” here, and I think it does not rely on observing the confounder. If your decision theory does then (a) your decision theory isn’t as smart as it could be, and (b) you are needlessly restricting yourself to certain types of decision theories.
People have been thinking about confounders for a long time (earliest reference known to me to a “randomized” trial is the book of Daniel, see also this: http://ije.oxfordjournals.org/content/33/2/247.long). There is a lot of nice clever math that gets around unobserved confounders developed in the last 100 years or so. Saying “well we just need to observe confounders” is sort of silly. That’s like saying “well, if you want to solve this tricky computational problem forget about developing new algorithms and that whole computational complexity thing, and just buy more hardware.”
I don’t know what kind of actual problems you work on, but the reality of life in stats, medicine, etc. is you have your dataset and you got to draw conclusions from it. The dataset is crappy—there is probably selection bias, all sorts of missing data, censoring, things we would really liked to have known but which were never collected, etc. This is just a fact of life for folks in the trenches in the empirical sciences/data analysis. The right answer here is not denial, but new methodology.
For non experts in the thread, what’s the name of this area and is there a particular introductory text you would recommend?
Thanks for your interest! The name of the area is “causal inference.” Keywords: “standardization” (in epidemiology), “confounder or covariate adjustment,” “propensity score”, “instrumental variables”, “back-door criterion,” “front-door criterion,” “g-formula”, “potential outcomes”, “ignorability,” “inverse probability weighting,” “mediation analysis,” “interference”, etc.
Pearl’s Causality book (http://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X/ref=pd_sim_sbs_b_1) is a good overview (but doesn’t talk a lot about statistics/estimation). Early references are Sewall Wright’s path analysis paper from 1921 (http://naldc.nal.usda.gov/download/IND43966364/PDF) and Neyman’s paper on potential outcomes from 1923 (http://www.ics.uci.edu/~sternh/courses/265/neyman_statsci1990.pdf). People say either Sewall Wright or his dad invented instrumental variables also.
Thanks
You’re sort of missing what Ilya is trying to say. You might have to look at the actual details of the example he is referring to in order for this to make sense. The general idea is that even though we can’t observe certain variables, we still have enough evidence to justify the causal model where HAART leads to fewer people die, so we can conclude that we should prescribe it.
I would object to Ilya’s more general point though. Saying that EDT would use E(death|HAART) to determine whether to prescribe HAART is making the same sort of reference class error you discuss in the post. EDT agents use EDT, not the procedures used to A0 and A1 in the example, so we really need to calculate E(death|EDT agent prescribes HAART). I would expect this to produce essentially the same results as a Pearlian E(death | do(HAART)), and would probably regard it as a failure of EDT if it did not add up to the same thing, but I think that there is value in discovering how exactly this works out, if it does.
A challenge (not in a bad sense, I hope): I would be interested in seeing an EDT derivation of the right answer in this example, if anyone wants to do it.
Yeah, unfortunately everyone who responded to your question went all fuzzy in the brain and started philosophical evasive action.