cupholder comments on Case study: abuse of frequentist statistics

cupholder 22 Feb 2010 2:57 UTC
10 points
0
I’m not seeing why what you call “the real WTF” is evidence of a problem with frequentist statistics. The fact that the hypothesis test would have given a statistically insignificant p-value whatever the actual 6 data points were just indicates that whatever the population distributions, 6 data points are simply not enough to disconfirm the null hypothesis. In fact you can see this if you look at Mann & Whitney’s original paper! (See the n=3 subtable in table I, p. 52.)

I can picture someone counterarguing that this is not immediately obvious from the details of the statistical test, but I would hope that any competent statistician, frequentist or not, would be sceptical of a nonparametric comparison of means for samples of size 3!
- James_K 22 Feb 2010 5:25 UTC
  2 points
  0
  Parent
  I’m an econometrician by training and when I was taught non-parametric testing I was told the minimum sample size to get a useful result was 10. Either the authors of the article had forgotten this, or there is something very wrong with how they were taught this test.
- Cyan 22 Feb 2010 3:26 UTC
  1 point
  0
  Parent
  Thanks for the pointer to the original paper.
  
  I’m not seeing why what you call “the real WTF” is evidence of a problem with frequentist statistics.
  
  Check out the title: abuse of frequentist statistics. Yes, at the end, I argue from a Bayesian perspective, but you don’t have to be a Bayesian to see the structural problems with frequentist statistics as currently taught to and practiced by working scientists.
  
  I would hope that any competent statistician, frequentist or not, would be sceptical of a nonparametric comparison of means for samples of size 3!
  
  Me too. But not all papers with shoddy statistics are sent to statisticians for review. Experimental biologists in particular have a reputation for math-phobia. (Does the fact that when I saw the sample size the word “underpowered” instantly jumped into my head count as evidence that I am competent?)
  - cupholder 22 Feb 2010 3:49 UTC
    14 points
    0
    Parent
    
    Check out the title: abuse of frequentist statistics. Yes, at the end, I argue from a Bayesian perspective, but you don’t have to be a Bayesian to see the structural problems with frequentist statistics as currently taught to and practiced by working scientists.
    
    I agree that frequentist statistics are often poorly taught and understood, and that this holds however you like to do your statistics. Still, the main post feels to me like a sales pitch for Bayes brand chainsaws that’s trying to scare me off Neyman-Pearson chainsaws by pointing out how often people using Neyman-Pearson chainsaws accidentally cut off a limb with them. (I am aware that I may be the only reader who feels this way about the post.)
    
    (Does the fact that when I saw the sample size the word “underpowered” instantly jumped into my head count as evidence that I am competent?)
    
    Yes, but it is not sufficient evidence to reject the null hypothesis of incompetence at the 0.05 significance level. (I keed, I keed.)
    What links here?
    Cyan's comment on Open Thread: March 2010 by AdeleneDawner (1 Mar 2010 14:04 UTC; 24 points)
    - thomblake 22 Feb 2010 13:50 UTC
      4 points
      0
      Parent
      
      a sales pitch for Bayes brand chainsaws
      
      I get that impression a lot around here
    - Cyan 22 Feb 2010 4:17 UTC
      3 points
      0
      Parent
      
      Still, the main post feels to me like a sales pitch...
      
      It’s a fair point; I’m not exactly attacking the strongest representative of frequentist statistical practice. My only defense is that this actually happened, so it makes a good case study.
      - cupholder 22 Feb 2010 4:40 UTC
        3 points
        0
        Parent
        That’s true, and having been reminded of that, I think I may have been unduly pedantic about a fine detail at the expense of the main point.
      - PhilGoetz 25 Feb 2010 14:25 UTC
        −1 points
        0
        Parent
        It’s a good case study, but it’s not evidence of a problem with frequentist statistics.
        Cyan 25 Feb 2010 14:36 UTC
        0 points
        0
        Parent
        I assert that it is evidence in my concluding paragraph, but it’s true that I don’t give an actual argument. Whether one counts it as evidence would seem to depend on the causal assumptions one makes about the teaching and practice of statistics.
        PhilGoetz 25 Feb 2010 22:25 UTC
        0 points
        0
        Parent
        Perhaps it’s frequentist evidence against frequentist statistics.
        Cyan 26 Feb 2010 0:30 UTC
        1 point
        0
        Parent
        I think this is just a glib rejoinder, but if there’s a deeper thought there, I’d be interested to hear it.
        PhilGoetz 27 Feb 2010 4:04 UTC
        2 points
        0
        Parent
        The critique of frequentist statistics, as I understand it—and I don’t think I do—is that frequentists like to count things, and trust that having large sample sizes will take care of biases for them. Therefore, a case in which frequentist statistics co-occurs with bad results counts against use of frequentist statistics, and you don’t have to worry about why the results were bad.
        
        The whole Bayesian vs. frequentist argument seems a little silly to me. It’s like arguing that screws are better than nails. It’s true that, for any particular individual joint you wish to connect, a screw will probably connect it more securely and reversibly than a nail. That doesn’t mean there’s no use for nails.
  - brian_jaress 23 Feb 2010 18:02 UTC
    5 points
    0
    Parent
    I think that, in this case, the underlying problem was not caused by the way frequentist statistics are commonly taught and practiced by working scientists:
    
    In the present case, the null hypothesis is that the old method and the new method produce data from the same distribution; the authors would like to see data that do not lead to rejection of the null hypothesis.
    
    I’m no statistician, but I’m pretty sure you’re not supposed to make your favored hypothesis the null hypothesis. That’s a pretty simple rule and I think it’s drilled into students and enforced in peer review.
    
    I see that as the underlying problem because it reverses the burden of proof. If they had done it the right way around, six data points would have been not enough to support their method instead of being not enough to reject it. Making your favored hypothesis the null hypothesis can allow you, in the extreme, to rely on a single data point.
    - Cyan 23 Feb 2010 18:18 UTC
      2 points
      0
      Parent
      In the OP I did refer to that when I wrote:
      
      Now even from a frequentist perspective, this is wacky. Hypothesis testing can reject a null hypothesis, but cannot confirm it, as discussed in the first paragraph of the Wikipedia article on null hypotheses.
      
      You wrote:
      
      That’s a pretty simple rule and I think it’s drilled into students and enforced in peer review.
      
      Not all papers are reviewed by people who know the rule. I was taught that rule over ten years ago, and I didn’t remember it when my colleague showed me the analysis. (I did recall it eventually, just after I ran the sanity check. Evidence against my competence!) My colleague whose job it was to review the paper didn’t know/recall the rule either.
  - PhilGoetz 25 Feb 2010 14:09 UTC
    −3 points
    0
    Parent
    
    Check out the title: abuse of frequentist statistics. Yes, at the end, I argue from a Bayesian perspective, but you don’t have to be a Bayesian to see the structural problems with frequentist statistics as currently taught to and practiced by working scientists.
    
    Well, I don’t see the structural problems. (I don’t even know what a structural problem is.)
    
    Somebody, please write a top-level post addressing this. Stop saying “Frequentists are bad” and leaving it at that. This is a great story; but it’s not valid argumentation to try to convert it into an anti-frequentist tract.
    - Kevin 25 Feb 2010 14:18 UTC
      2 points
      0
      Parent
      I’d love to see a top-level post where someone suggests the best and/or most realistic way for scientists to do their statistics. I’m actually rather ignorant with regards to probability theory. I got a D in second semester frequentist statistics (hard teacher + I didn’t go to class or try very hard on the homework) which is indicative of how little I learned in that class. I did better in my applied statistics classes.
      
      When is it good for scientists to do null hypothesis testing?
    - Cyan 25 Feb 2010 14:13 UTC
      0 points
      0
      Parent
      What specifically is the “this” you want addressed? I’m not sure what its referent is.
- PhilGoetz 25 Feb 2010 14:06 UTC
  −1 points
  0
  Parent
  Right—show us how you would have done this test correctly using Bayesian statistics.
  - Cyan 25 Feb 2010 14:22 UTC
    0 points
    0
    Parent
    That did come up in comments; you can find the discussion here.