I took the survey
Anders_H
As someone who gave up a career in medicine in order get a doctoral degree in Causal Inference, I am half-upvoting this, because I really want it to be true :-)
I originally trained as a medical doctor, but came to the conclusion that what I was doing had almost no value on utilitarian grounds. Sure, once in a while you feel good about helping a patient, but really, if you weren’t working that day, somebody else would have done the same thing. I decided I would rather have my one-in-a-thousand chance of coming up with an original idea with real impact, instead of spending the rest of my career as a doctor, where my utilitarian impact would almost certainly be negligible.
I came to the Harvard School of Public Health intent on going into academic Global Health, but after I took an introductory course on applied causal inference with some basic DAG theory, that all changed. Partly, this was because I recognized the importance of Causal DAGs from reading Less Wrong. I ended up staying at HSPH to get a doctoral degree with some of the leading researchers in the field; this even allowed me to take a course that Ilya was a Teaching Assistant for (I ended up being a TA for the same course the following year)
Currently, my career plan is to get a faculty job at some school of public health, where I see my mission as taking part in a “reboot” of epidemiology and comparative effectiveness research, to cleanse it of the cargo cult science and magical thinking that is currently all too common, and train investigators in rigorous causal reasoning. I honestly believe that this could have a major utilitarian impact, because in the absence of randomized trials, proper causal reasoning about observational data is the only way we can learn how to make better clinical decisions that optimize patient outcomes,
( Hopefully, if I play my cards right, this career choice will also have the added benefit of giving me sufficient status in the medical community to get a real discussion started on some of the most horrific things that doctors do to patients)
This note is for readers who are unfamiliar with The_Lion:
This user is a troll who has been banned multiple times from Less Wrong. He is unwanted as a participant in this community, but we are apparently unable to prevent him from repeatedly creating new accounts. Administrators have extensive evidence for sockpuppetry and for abuse of the voting system. The fact that The_Lion’s comment above is heavily upvoted is almost certainly entirely due to sockpuppetry. It does not reflect community consensus
I also have an objective. My objective is this: At least somewhere on the internet, there should exist a community where people can have real discussion, ie, a dispassionate exchange of priors, likelihood ratios and arguments. It will not be possible for me to achieve my objective if participants turn discussions into wars. It will also not work if people with certain views feel unwelcome, or scared to vocalize their views.
Yes, he may have been acting rationally, in the same way that somebody who defects in Prisoner’s Dilemma acts rationally. In fact, it would be rational for anyone to use unacceptable tactics in order for their side to “win” the discussion. However, the continued existence of Less Wrong as a rationalist community depends on people cooperating in this game. Moloch will certainly kill the rationalist spirit if we don’t punish defectors.
Sometimes it is rational to punish defectors even if the defectors themselves are acting rationally. I do however understand that this is a difficult trade-off, as we have seen strong evidence that there are people who are willing to participate and have high-quality insights that are not easily obtained elsewhere, but who refuse to play by the rules.
I am going to publicly call for banning user VoiceOfRa for the following reasons:
(1) VoiceOfRa is almost certainly the same person as Eugene_Nier and Azathoth123. This is well known in rationality circles; many of us have been willing to give him a second chance under a new username because he usually makes valuable contributions.
(2) VoiceOfRa almost certainly downvote bombed the user who made the grandparent comment, including downvoting some very uncontroversial and reasonable comments.
(3) As I have said before in this context, downvote abuse is very clear evidence of being mindkilled. It is also a surefire way to ensure you never change your mind, because you discourage people who disagree with you from taking part in the discussion and therefore prohibit yourself from updating on their information. I do not understand how someone who genuinely believes in epistemic rationality could think this is a good strategy.
I will also note that I was the first person to publicly call out Eugine_Nier under his previous username, Azathoth123, at http://lesswrong.com/lw/l0g/link_quotasmicroaggressionandmeritocracy/bd4o . Like I said in that comment, I continue to believe he is a valuable contributor to the community. Like many other people, I have been willing to give him a second chance under his new username. However, this was conditional on completely ceasing and desisting with the downvote abuse. And yes, any downvoting of old comments made in a different context is a clear example of abuse.
The following links provide background material for readers who are unfamiliar with Eugine_Nier and the context in which I am requesting a ban:
http://lesswrong.com/r/discussion/lw/kbk/meta_policy_for_dealing_with_users/ http://lesswrong.com/lw/kfq/moderator_action_eugine_nier_is_now_banned_for/ http://lesswrong.com/lw/ld0/psa_eugine_nier_evading_ban/
Edited to add: If I see clear evidence that VoiceOfRa is not Eugine_Nier, or that he was not behind the most recent downvote abuse, I will retract this message and publicly apologize
- 2 Dec 2015 3:03 UTC; 21 points) 's comment on Weirdness at the wiki by (
- 3 Jan 2016 17:46 UTC; 2 points) 's comment on Why You Should Be Public About Your Good Deeds by (
I don’t believe you can obtain an understanding of the idea that “correlation does not imply causation” from even a very deep appreciation of the material in Statistics 101. These courses usually make no attempt to define confounding, comparability etc. If they try to define confounding, they tend to use incoherent criteria based on changes in the estimate. Any understanding is almost certainly going to have to originate from outside of Statistics 101; unless you take a course on causal inference based on directed acyclic graphs it will be very challenging to get beyond memorizing the teacher’s password
You will probably want to edit the title to add the qualifier “in mice”. Results from mouse models are notorious for not generalizing to humans. That said, this looks interesting; thanks for bringing it to my attention. I definitely hope this research gets all the funding it needs, it is certainly a bet worth taking even if the chance of payoff is low.
As the token epidemiologist in the Less Wrong community, I should probably comment on this.
The utility of learning epidemiology will depend critically on what you mean by the word:
If you interpret “epidemiology” as the modern theory of causal inference and causal reasoning applied to health and medicine, then learning epidemiology is very useful, so much so that I believe that a course on causal reasoning should be required in high school. If you are interested in learning this material, my advisor is writing a book on Causal Inference in Epidemiology, part of which is freely available at http://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ . For more mathematically oriented readers, Pearl’s book is also great.
If you interpret “epidemiology” to mean the material you will learn when taking a course called “Epidemiology”, or to mean the methods used in most papers published in epidemiologic journals (ie endless Cox models, p-hacking, model selection algorithms and incoherent reasoning about confounding), then what you will get is a broken epistemology with negative utility. Stay far away from this—people who don’t have the time to learn proper causal reasoning are better off with the heuristic “if it is not randomized, don’t trust it” . This happens to be the mindset of most clinicians, and appropriately so.
Last week, I gave a presentation at the Boston meetup, about using causal graphs to understand bias in the medical literature. Some of you requested the slides, so I have uploaded them at http://scholar.harvard.edu/files/huitfeldt/files/using_causal_graphs_to_understand_bias_in_the_medical_literature.pptx
Note that this is intended as a “Causality for non-majors” type presentation. If you need a higher level of precision, and are able the follow the maths, you would be much better off reading Pearl’s book.
(Edited to change file location)
The key issue is that we’re asking a counterfactual question. The question itself will be underdefined without the context of a causal model. The Russian roulette hypothetical is a good example: “Our goal is to find out what happens in Norway if everyone took up playing Russian roulette once a year”. What does this actually mean? Are we asking what would happen if some mad dictator forced everyone to play Russian roulette? Or if some Russian roulette social media craze caught on? Or if people became suicidal en-masse and Russian roulette became popular accordingly? These are different counterfactuals, and the answer will be different depending on which of these we’re talking about. We need the machinery of counterfactuals—and therefore the machinery of causal models—in order to define what we mean at all by “what happens in Norway if everyone took up playing Russian roulette once a year”. That counterfactual only makes sense at all in the context of a causal model, and is underdefined otherwise.
I absolutely agree that this is a counterfactual question. I am using the machinery of counterfactuals and causal models, just a different causal model from the one you and Pearl prefer. In this case, I had in mind a situation that is roughly equivalent to a mad dictator forcing everyone to play Russian roulette, but the underspecified details are not all that important to the argument I am making.
I assume by “unmeasured causes” you mean latent variables - i.e. variables in the causal graph which happened to not be observed. A causal diagram framework can handle latent variables just fine; there is no fundamental reason why every variable needs to be measured. Latent variables are a pain computationally, but they pose no fundamental problem mathematically.
This is straight up wrong, and on this particular point the causal inference establishment is on my side, not yours. For example, if there are backdoor paths that cannot be closed without conditioning on a latent variable, then the causal effect is not identified and there is no amount of computation that can get around this.
Indeed, much of machine learning consists of causal models with latent variables.
Much of machine learning gets causality wrong.
Whether the treatment has an effect does not seem relevant here at all.
It is relevant because it allows me to construct a very simple scenario where we have very strong intuition that extrapolation should work; yet Pearl’s selection diagram fails to make a prediction for the target population.
No. My intuition very strongly says that 100% of the relevant structural information/model can be directly captured by causal models, and that you’re just not used to encoding these sorts of intuitions into causal models. Indeed, counterfactuals are needed even to define what we mean, as in the Russian roulette example. The individual counterfactual distributions really are the thing we care about, and everything else is relevant only insofar as it approximates those counterfactual distributions in some situations.
I agree that you can encode all structural information in causal models. I do not agree that all structural information can be encoded in DAGs, which are one particular type of causal model. There are several examples of background information about the causal structure, which are essential for identifiability and which cannot be encoded on standard DAGs. For example, monotonicity is necessary for instrumental variable identification.
I am arguing that there is a special type of background information that is crucial for generalizability, and which cannot be encoded in Pearl/Bareinboim’s causal diagrams for transportability. I therefore proposed a non-DAG causal model which is able to use this background structural knowledge. The Russian roulette example is an attempt to illustrate the nature of this class of background knowledge.
This does not mean that it is impossible to make an extension of the causal DAG framework to encode the same information. I am just arguing that this is not what the Pearl/Bareinboim selection diagram framework does.
Overall, my impression is that you don’t actually understand how to build causal models, and you are very confused about their applicability and limitations.
I did specifically invoke Crocker’s Rules, so I’d like to thank you for this feedback.
Of course, I think you are wrong about this. I dislike appeals to authority, but I would like to point out that I have a doctoral degree in epidemiologic methodology from Harvard, and that my thesis advisors were genuine thought leaders in causal modelling. I also want to point out that both my papers on this topic have been reviewed by editors and peer-reviewers with a deep understanding of causal models.
This does of course not necessarily mean that you are wrong. It does however mean that I think you should adjust your priors and truly try to understand my argument before you reach such a strong posterior.
If you genuinely have found a flaw in my argument, I’d like you to state it explicitly rather than just claim that I don’t understand causal models. In a hypothetical world in which I am wrong, I would very much like to know about it, as it would allow me to move on and work on something else.
Solomonoff Induction is uncomputable, and implementing it will not be possible even in principle. It should be understood as an ideal which you should try to approximate, rather than something you can ever implement.
Solomonoff Induction is just bayesian epistemology with a prior determined by information theoretic complexity. As an imperfect agent trying to approximate it, you will get most of your value from simply grokking Bayesian epistemology. After you’ve done that, you may want to spend some time thinking about the philosophy of science of setting priors based on information theoretic complexity.
As I explained in the FAQ it’s logically impossible for “important” things to be underfunded.
Logically impossible? You chose the wrong website to make that argument. See 37 Ways Words Can Be Wrong, in particular The Parable of Hemlock
I reduced my Erdos number from infinity to at most 4. See http://link.springer.com/article/10.1007/s40471-015-0045-5
Thank you! I know that you will find a lot of errors in this sequence, so please point them out whenever you see them.
The reason for the superscript is that this was originally written for students in the epidemiology department at HSPH, where superscript is the standard notation due to Prof Hernan’s book. I didn’t want to change all the notation for the Less Wrong adaptation..
I am currently on my phone and will fix the definition of a confounder later tonight when I have access to a real computer. It is probably too early to give a definition before I introduce graphs
Edited to add: The simplest DAG where the definition in the first sentence of Wikipedia fails is when the suspected confounder is a mediator. I think the simplest example where the second definition on Wikipedia fails is M-bias. I will cover M-Bias in Part 3 of the sequence
Neither. Obviously, the average excellence of “doctors, masters and bachelors” of the most renowned universities is higher than the average excellence of people who are self-taught. Nobody suggests that being self-taught correlates positively with excellence.
The quotation is still undoubtedly true, because there are many more individuals who are self-taught than individuals who have these credentials. It is also plausible that the variance in excellence among the self-taught is much higher. Therefore, it is trivial to identify self-taught individuals who are more knowledgeable than most highly credentialed university graduates.
In fact, as a doctoral student in applied causal inference at a fairly renowned university, I can identify several self-taught Less Wrong community members who understand causality theory better than I do.
No. This is not about interpretation of probabilities. It is about choosing what aspect of reality to rely on for extrapolation. You will get different extrapolations depending on whether you rely on a risk ratio, a risk difference or an odds ratio. This will lead to real differences in predictions for what happens under intervention.
Even if clinical decisions are entirely left to an algorithm, the algorithm will need to select a mathematical object to rely on for extrapolation. The person who writes the algorithm needs to tell the algorithm what to use, and the answer to that question is contested. This paper contributes to that discussion, and proposes a concrete solution. One that has been known for 65 years, but never used in practice.
The one-year embargo on my doctoral thesis has been lifted, it is now available at https://dash.harvard.edu/bitstream/handle/1/23205172/HUITFELDT-DISSERTATION-2015.pdf?sequence=1 . To the best of my knowledge, this is the first thesis to include a Litany of Tarski in the introduction.
There is a lot of interest in prediction markets in the Less Wrong community. However, the prediction markets that we have are currently only available in meatspace, they have very low volume, and the rules are not ideal (You cannot leave positions by selling your shares, and only the column with the final outcome contributes to your score)
I was wondering if there would be interest in a prediction market linked to the Less Wrong account? The idea is that we use essentially the same structure as Intrade / Ipredict. We use play money—this can either be Karma or a new “currency” where everyone is assigned the same starting value. If we use a currency other than Karma, your balance would be publicly linked to your account, as an indicator of your predictive skills.
Perhaps participants would have to reach a specified level of Karma before they are allowed to participate, to avoid users setting up puppet accounts to transfer points to their actual accounts
I think such a prediction market would act as a tax on bullshit, it would help aggregate information, it would help us identify the best predictors in the community, and it would be a lot of fun.
Identifiability, sure. But latents still aren’t a problem for either extrapolation or model testing, as long as we’re using Bayesian inference. We don’t need identifiability.
I am not using Bayesian inference, and neither are Pearl and Bareinboim. Their graphical framework (“selection diagrams”) is very explicitly set up as model for reasoning about whether the causal effect in the target population is identified in terms of observed data from the study population and observed data from the target population. Such identification may succeed or fail depending on latent variables and depending on the causal structure of the selection diagram.
I am confident that Pearl and Bareinboim would not disagree with me about the preceding paragraph. The point of disagreement is whether there are realistic ways to substantially reduce the set of variables that must be measured, by using background knowledge about the causal structure that cannot be represented on selection diagrams.
The obvious causal model for the Russian roulette example is one with four nodes:
first node indicating whether roulette is played
second node, child of first, indicating whether roulette killed
third node, child of second, indicating whether some other cause killed (can only happen if the person survived roulette)
fourth node, death, child of second and third node
This makes sense physically, has a well-defined counterfactual for Norway, and produces the risk difference calculation from the post. What information is missing?
In my model of reality (and I am sure, in most other people’s model of reality), the third node has a wide range of unobserved latent ancestors. If the goal is to make inferences about the effect of Russian roulette in Russia using data from Russia, your analytic objective will be to find a set of nodes that d-separate the first node from the fourth node. You do not need to condition on the latent causes of the third node to achieve this (because those latent variables are not also causes of the first node- they cannot be, because the first node was randomized). The identification formula for the effect in Russia is therefore invariant to whether the latent causes of the third node are represented on the graph or not, and you therefore do not have to show them. The DAG model then represents a huge equivalence class of causal models; you can be agnostic between causal models within this equivalence class because the inferences are invariant between them.
But if the goal is to make predictions about the effect in Norway using data from Russia, these latent variables suddenly become relevant. The goal is no longer to d-separate the fourth node from the first node, but to d-separate the fourth node from an indicator for whether a person lives in Russia or Norway. In the true data generating mechanism (i.e. in the reality that the model is trying to represent), there almost certainly are a substantial number of open paths between the indicator for whether a person lives in Norway or Russia and their risk of death. The only possible identification formula for the effect in Russia includes terms for distributions that are conditional on the latent variables. The effect in Norway is therefore not identified from the Russian data.
The underlying structure of reality is still a DAG, it’s only our information about reality which will be non-DAG-shaped. DAGs show the causal structure
I agree that reality is generated by a structure that looks something like a directed acyclic graph. But that does not mean that all significant aspects of reality can be modeled using Pearl’s specific operationalization of causal DAGs/selection diagrams.
Any attempt to extrapolate from Russia to Norway is going to depend on a background belief that some aspect of the data generating structure is equal between the countries. In the case of Russian roulette, I argue that the natural choice of mathematical object to hang our claims to structural equality on, is the parameter that takes the value 5⁄6 in both countries.
In DAG terms, you can think of the data generating mechanism for node 4 as responding to a property of the path 1->2->4. In particular, this path forces the quantities Pr(Fourth node =0 | do(First node=1)) and Pr(Fourth node =0 | do(First node=0)) to be related by a factor of 5⁄6 in both countries. Reality still has a DAG structure, but you won’t find a way to encode the figure 5⁄6 in a causal model based only on selection diagrams. Without a way to encode a parameter that takes the value 5⁄6, you have to take a long detour where you collect a truckload of data and measure all the latent variables.
I completed the survey (and learned surprising things about my digit ratio)