IlyaShpitser comments on The Statistician’s Fallacy

IlyaShpitser 11 Dec 2013 16:37 UTC
23 points

The professor had a background in statistics, and as far as I could tell knew her stuff in that area (though she dismissed Bayesianism in favor of orthodox statistics).

Bayesians will realize that, since there’s a good chance that of happening even when the conclusion is correct and well- supported by the evidence, finding mistakes in the statistics is only weak evidence that the conclusion is wrong.

Wow, lesswrong, you just never fail to do this at every opportunity. Bayesianity is not a minority view anymore. Bayesians do not have a monopoly on correct reasoning with probabilities. Seriously, knock it off, please.

The professor had a background in statistics

Do you have a background in statistics, Chris?

edit: One of the areas I am working on is “causal discovery,” which is learning the structure of graphs from observational data. One problem I have worked on a lot is causal discovery in the presence of hidden variables. It turns out there is a very interesting statistical model that recovers all independence constraints that a hidden variable DAG imposes on the observed margin. It also turns out that there is a way to write down the likelihood for this model in the case of discrete state spaces, while doing the same for continuous state spaces is currently unknown. This suggests that a search and score method (e.g. Bayesian method, or at least a method with a Bayesian justification) is natural for the discrete case, while a method based on hypothesis testing (e.g. a frequentist method, although Bayesian versions are possible here, they are less satisfactory because there is no global posterior) is natural for the continuous case. After all, we can’t very well figure out what the posterior is if we can’t even write the likelihood down.

Did the above paragraph make sense to you? These are the kinds of consideration people have in mind when thinking about B vs F. If you aren’t working in ML/stats I am not sure what the point even is of having an opinion on this topic, other than “belief as attire.”

It’s completely bizarre. Somehow when it comes to B vs F, LW is willing to tell experts what they should be doing in their area of expertise.
- Vaniver 11 Dec 2013 22:33 UTC
  7 points
  Parent
  Ilya, I’m curious what your thoughts on Beautiful Probability are.
  
  Personally, I flinch whenever I get to the “accursèd frequentists” line. But beyond that I think it does a decent job of arguing that Bayesians win the philosophy of statistics battle, even if they don’t generate the best tools for any particular application. And so it seems to me that in ML or stats, where the hunt is mostly for good tools instead of good laws, having the right philosophy is only a bit of a help, and can be a hindrance if you don’t take the ‘our actual tools are generally approximations’ part seriously.
  
  In this particular example, it seems to me that ChrisHallquist has a philosophical difference with his stats professor, and so her not being Bayesian is potentially meaningful. I think that LW should tell statisticians that they shouldn’t believe cell phones cause cancer, even if they shouldn’t tell them what sort of conditional independence tests to use when they’re running PC on a continuous dataset.
  - IlyaShpitser 12 Dec 2013 0:02 UTC
    5 points
    Parent
    Well, I am no Larry Wasserman.
    
    But it seems to me that Bayesians like to make ‘average case’ statements based on their posterior, and frequentists like to make ‘worst case’ statements using their intervals. In complexity theory average and worst case analysis seem to get along just fine. Why can’t they get along here in probability land?
    
    I find the philosophical question ‘what is probability?’ very boring.
    
    Unrelated comment : the issue does not arise with PC, because PC learns fully observable DAG models, for which we can write down the likelihood just fine even in the continuous case. So if you want to be Bayesian w/ DAGs, you can run your favorite search and score method. The problem arises when you get an independence model like this one:
    
    { p(a,b,c,d) | A marginally independent of B, C marginally independent of D (and no other independences hold) }
    
    which does not correspond to any fully observable DAG, and you don’t think your continuous-valued data is multivariate normal. I don’t think anyone knows how to write down the likelihood for this model in general.
    - Vaniver 12 Dec 2013 0:39 UTC
      4 points
      Parent
      
      Why can’t they get along here in probability land?
      
      Agreed.
      
      the issue does not arise with PC, because PC learns fully observable DAG models, for which we can write down the likelihood just fine even in the continuous case.
      
      Correct; I am still new to throwing causality discovery algorithms at datasets and so have not developed strong mental separations between them yet. Hopefully I’ll stop making rookie mistakes like that soon (and thanks for pointing it out!).
  - EHeller 11 Dec 2013 23:34 UTC
    0 points
    Parent
    While I’m not Ilya, I find the ‘beautiful probability’ discussion somewhat frustrating.
    
    Sure, if we test different hypotheses with the same low sample data, we can get different results. However, starting from different priors, we can also get different results with that same data. Bayesianism won’t let you escape the problem, which is ultimately a problem of data volume.
    - alex_zag_al 13 Dec 2013 0:03 UTC
      0 points
      Parent
      LW (including myself) is very influenced by ET Jaynes, who believed that for every state of knowledge, there’s a single probability distribution that represents it. Therefore, you’d only get different results from the same data if you started with different knowledge.
      
      It makes a lot of sense for your conclusions to depend on your knowledge. It’s not a problem.
      
      Finding the prior that represents your knowledge is a problem, though.
      - EHeller 13 Dec 2013 0:50 UTC
        1 point
        Parent
        I’ve read Jaynes (I used to spend long hours trying to explain to a true-believer why I thought MaxEnt was a bad approach to out-of-equilibrium thermo), but my point is that for small sample data, assumptions will (of course) matter. For our frequentist, this means that the experimental specification will lead to small changes in confidence intervals. For the Bayesian this means that the choice of the prior will lead to small changes in credible intervals.
        
        Neither is wrong, and neither is “the one true path”- they are different, equally useful approaches to the same problem.
  - V_V 17 Dec 2013 4:22 UTC
    −2 points
    Parent
    ″ < Jaynes quote > … If Nature is one way, the likelihood of the data coming out the way we have seen will be one thing. If Nature is another way, the likelihood of the data coming out that way will be something else. But the likelihood of a given state of Nature producing the data we have seen, has nothing to do with the researcher’s private intentions. So whatever our hypotheses about Nature, the likelihood ratio is the same, and the evidential impact is the same, and the posterior belief should be the same, between the two experiments. At least one of the two Old Style methods must discard relevant information—or simply do the wrong calculation—for the two methods to arrive at different answers.”
    
    This seems to be wrong.
    EY makes a sort of dualistic distinction between “Nature” (with a capital “N”) and the researcher’s mental state. But what EY (and possibly Jaynes, though I can’t tell from a short quote) is missing is that the researcher’s mental state is part of Nature, and in particular is part of the stochastic processes that generate the data for these two different experimental settings. Therefore, any correct inference technique, frequentist or Bayesian, must treat the two scenarios differently.
    - Vaniver 17 Dec 2013 5:32 UTC
      2 points
      Parent
      The point that EY is making there is kind of subtle. Think about it this way:
      
      There’s a hidden double selected uniformly at random that’s between 0 and 1. You can’t see what it is; you can only press a button to see a 1 if another randomly selected double (over the same range) is higher than it, or 0 if the new double is less than or equal to it.
      
      One researcher says “I’m going to press this button 100 times, and then estimate what the hidden double is.” The second research says “I’m going to press this button until my estimate of the double is at most .4.” Coincidentally, they see the exact same sequence of 100 presses, with 70 1s.
      
      The primary claim is that the likelihood ratio from seeing 70 1s and 30 0s is the same for both researchers, and this seems correct to me. (How can the researcher’s intention change the hidden double?) The secondary claim is that the second researcher receives no additional information from the potentially surprising fact that he required 100 presses under his decision procedure. I have not put enough thought into it to determine whether or not the secondary claim is correct, but it seems likely to me that it is.
      - V_V 17 Dec 2013 6:14 UTC
        0 points
        Parent
        Split the researchers that generate the data from the reasoner who is trying to estimate the hidden double from the data.
        
        What is the data that the estimator receives? There is clearly a string of 100 bits indicating the results of the comparisons, but there is also another datum which indicates that the experiment was stopped after 100 iterations. This is a piece of evidence which must be included in the model, and the way to include it depends on the estimator’s knowledge of the stopping criterion used by the data generator.
        
        The estimator has to take into account the possibility of cherry picking.
        
        EDIT:
        
        I think I can use an example:
        
        Suppose that I give you N =~ 10^9 bits of data generated according to the process you describe, and I declare that I had precommitted to stop gathering data after exactly N bits. If you trust me, then you must believe that you have an extremely accurate estimate of the hidden double. After all, you are using 1 gigabit of data to estimate less than 64 bits of entropy!
        
        But then you learn that I lied about the stopping criterion, and I had in fact precommitted to stop gathering data at the point that it would have fooled you into believing with very high probability that the hidden number was, say, 0.42.
        
        Should you update your belief on the hidden double after hearing of my deception? Obviously you should. In fact, the observation that I gave you so much data now makes the estimate extremely suspect, since the more data I give you the more I can manipulate your estimate.
        Vaniver 17 Dec 2013 8:26 UTC
        0 points
        Parent
        So, suppose I know the stopping criterion and the number of button presses that it took to stop the sequence, but I wasn’t given the actual sequence.
        
        It seems to me like I can use the two of those to recreate the sequence, for a broad class of stopping criteria. “If it took 100 presses, then clearly it must be 70 1s and 30 0s, because if it had been 71 1s and 29 0s he would have stopped then and there would be only 99 presses, but he wouldn’t have stopped at 69 1s and 30 0s.” I don’t think I have any additional info.
        
        Should you update your belief on the hidden double after hearing of my deception? Obviously you should.
        
        Update it to what? Assuming that the data is not tampered with, just that your stopping criterion was pointed at a particular outcome, it seems like that unless the double is actually very close to 0.42 then you are very unlikely to ever stop!* It looks like the different stopping criteria impose conditions on the order of the dataset, but the order is independent of the process that generates whether each bit is a 1 or a 0, and thus should be independent of my estimate of the hidden double.
        
        * If you imagine multiple researchers, each of which get different sequences, and I only hear from some of the researchers- then, yes, it seems like selection bias is a problem. But the specific scenario under consideration is two researchers with identical experimental results drawing different inferences from those results, which is different from two researchers with differing experimental setups having different distributions of possible results.
    - Watercressed 17 Dec 2013 5:06 UTC
      0 points
      Parent
      Different information about part of nature is not sufficient to change an inference—the probabilities could be independent of the researcher’s intentions.
      - V_V 17 Dec 2013 6:25 UTC
        0 points
        Parent
        The posterior probability of the observed data given the hidden variable of interest is in general not independent from the intentions of the researcher who is in charge of the data generation process.
- ChrisHallquist 11 Dec 2013 21:19 UTC
  −4 points
  Parent
  Oh my god! You’re right! How dare people on LessWrong mention their support for a view that’s been argued for at great length on here! The horror! The horror!
  
  I could write a long reply here… but I’m short on time, so I’ll just point out you’re making a lot of assumptions here, including about the details of what my professor said about Bayesianism.
  - Vaniver 11 Dec 2013 22:13 UTC
    6 points
    Parent
    
    Oh my god! You’re right! How dare people on LessWrong mention their support for a view that’s been argued for at great length on here! The horror! The horror!
    
    I’m ambivalent about this, actually. I think that interpreting probability as subjective uncertainty is more sensible, I think the Bayesian toolkit is superior for a wide array of problems, and I think that any technique generally called ‘Frequentist’ probably has an equivalent Bayesian interpretation.
    
    But I generally get the sense that the Bayesian-Frequentist divide is unproductive, and I am disappointed when LW seems to widen that divide.
    
    Unless you’re an active statistician, I don’t think you should expect you have a solid view of what statisticians currently think. Consider this other comment, where JoshuaZ points out that, at this point, Bayesianism is part of ‘orthodox statistics,’ and if you meant to say she was a Frequentist you should have said that directly.
    - [deleted] 12 Dec 2013 0:27 UTC
      6 points
      Parent
      
      But I generally get the sense that the Bayesian-Frequentist divide is unproductive, and I am disappointed when LW seems to widen that divide.
      
      Exactly; I couldn’t have put it better myself (and indeed, didn’t).
  - [deleted] 11 Dec 2013 22:00 UTC
    3 points
    Parent
    What your professor actually said about Bayesianism is irrelevant; GP is responding to what you said about your professor.
    
    Even before the edit, Ilya had a valid point; after the edit, you look like someone whose identity as a Bayesian has gotten in the way of thinking. No matter what you do, cut toward your enemy.
    - ChrisHallquist 11 Dec 2013 23:59 UTC
      −1 points
      Parent
      I replaced “orthodox statistics” with “frequentism” in the post in case that will make people happy, but as I understood him Ilya, wasn’t just complaining about that, but also my own implied support for Bayesianism over frequentism. And maybe the standard LessWrong position on that debate is wrong, but to come in and announce that the LW view is wrong without argument, when it’s been argued for at such great length, seems odd to put it midly.
      
      Ilya comes across as not being aware of how much Eliezer and other people here have written about that debate. In fact, it’s not even clear to me if he understands what someone like Eliezer (or for that matter, an academic epistemologist) means when they say “Bayesianism.”
      - Mayo 13 Dec 2013 3:17 UTC
        5 points
        Parent
        I realize Eliezer holds great sway on this blog, but I think people here ought to question a bit more closely some of his most winning arguments in favor of casting out frequents for Bayesianism. I’ve only read this blog around 4 times, and each time I’ve found a howler apparently accepted. But putting those aside, I find it curious that the results on psychological biases that is given so much weight on this blog are arrived at and affirmed by means of error statistical methodology. error statistics.com
        gwern 13 Dec 2013 4:45 UTC
        8 points
        Parent
        
        But putting those aside, I find it curious that the results on psychological biases that is given so much weight on this blog are arrived at and affirmed by means of error statistical methodology.
        
        Speaking as one of the LWers who has spent a fair bit of time reading up on both the heuristics & biases literature and also the problems & misuse of NHST (although I certainly couldn’t compare to your general statistical expertise), my position is basically that there’s no available literature which have examined the H&B topic with a superior methodology (so there’s no alternative we could use) and that on the whole H&B has found real effects despite the serious weaknesses in the methodology—for example, of the Reproducibility Project’s 13 targets, the ones which failed to replicate were priming effects and not the tested H&B effects (eg. sunk costs, anchoring, framing). The problems are not so bad as to drain the H&B results of all validity, just some.
        
        So while the H&B research program is no doubt undermined and hampered by the statistical tools and practices of the researchers involved, there seem little reason to think that the most-discussed biases are pure statistical mirages; and so they are entirely relevant to our discussions here.
        
        (From my perspective, the real question about the utility of the H&B literature to our practical discussions here on LW is not whether they exist in the lab settings they are studied in—it’s clear that they are not artifacts of p-value hacking or anything like that—but whether they operate in the real world to a meaningful extent and shape opinions & actions on a wide scale and on the topics we care about. This is, unfortunately, something which is very difficult to study no matter what methodology one might choose to use, and for this concern, criticizing the use of error statistical methodology is largely irrelevant.)
      - [deleted] 12 Dec 2013 0:26 UTC
        2 points
        Parent
        
        [They were also complaining about] my own implied support for Bayesianism over frequentism.
        
        I don’t see that anywhere. It’s clear the majority of LessWrong (the actual subject of Ilya’s actual sentences) thinks Bayesian statistics (who was talking about epistemology?) is better—with the possible exception of gwern, who uses whatever is most pragmatic (and who I personally think is the actual winner of this debate).
        
        Ilya comes across as not being aware of how much Eliezer and other people here have written about that debate. In fact, it’s not even clear to me if he understands what someone like Eliezer (or for that matter, an academic epistemologist) means when they say “Bayesianism.”
        
        Even a casual examination of their comment record (or, alternatively, a Google search) would have demonstrated that you’re completely wrong in your assessment. I don’t know any other regular on the site that knows more about statistics.
        Cyan 13 Dec 2013 4:40 UTC
        7 points
        Parent
        
        I don’t know any other regular on the site that knows more about statistics.
        
        Allow me to introduce myself. I make my living as a biostatistician; I am philosophically a Jaynesian, in practice a statistical ecumenist. (I don’t know and don’t claim to know causal inference in anything like the depth that Ilya does—that’s his specialty, just like mine is Bayesian modeling.)
        [deleted] 13 Dec 2013 18:35 UTC
        2 points
        Parent
        Nice to meet ya! Consider myself updated :)