gwern comments on Beyond Bayesians and Frequentists

gwern 31 Oct 2012 19:11 UTC
4 points
0

In fact, now that the year is 2012 the majority of new graduate students are being raised as Bayesians (at least in the U.S.) with frequentists thought of as stodgy emeritus professors stuck in their ways.

Is this actually true? Where would one get numbers on such a thing?
- IlyaShpitser 31 Oct 2012 19:24 UTC
  7 points
  0
  Parent
  No, it’s not true. This whole F vs B thing is such a false choice too. Does it make sense in computational complexity to have a holy war between average case and worst case analysis of algorithm running time? Maybe for people who go on holy wars as a hobby, but not as a serious thing.
  - Eliezer Yudkowsky 1 Nov 2012 7:00 UTC
    2 points
    0
    Parent
    
    Does it make sense in computational complexity to have a holy war between average case and worst case analysis of algorithm running time?
    
    Er, yes?
    - IlyaShpitser 1 Nov 2012 21:08 UTC
      19 points
      0
      Parent
      I don’t understand why this was linked as a response at all. Randomization is conjectured not to help in the sense that people think P = BPP. But there are cases where randomization does strictly help (wikipedia has a partial list: http://en.wikipedia.org/wiki/Randomized_algorithm).
      
      My point was about sociology. Complexity theorists are not bashing each other’s heads in over whether worst case or average case analysis is “better,” they are proving theorems relating the approaches, with the understanding that in some algorithm analysis applications, it makes sense to take the “adversary view,” for example in real time systems that need strict guarantees. In other applications, typical running time is a more useful quantity. Nobody calls worst case analysis an apostate technique. Maybe that’s a good example to follow. Keep religion out of math, please.
      - jsteinhardt 2 Nov 2012 7:24 UTC
        12 points
        0
        Parent
        
        Keep religion out of math, please.
        
        I agree with this. That was supposed to be the point of the post.
        
        Randomization is conjectured not to help in the sense that people think P = BPP.
        
        Even if P = BPP, randomization still probably helps; P = BPP just means that randomization doesn’t help so much that it separates polynomial from non-polynomial.
  - [deleted] 31 Oct 2012 20:53 UTC
    0 points
    0
    Parent
    Your analogy is imprecise. Average case and worst case analyses are both useful in their own right, and deal with different phenomena; F and B claim to deal with the same phenomena, but F is usually more vague about what assumptions its techniques follow from.
    
    A more apt analogy, in my opinion, would be between interpretations of QM. All of them claim to deal with the same phenomena, but some interpretations are more vague about the precise mechanism than others.
    - IlyaShpitser 31 Oct 2012 21:08 UTC
      5 points
      0
      Parent
      Why do you think F is more vague than B? I don’t think that’s true. LW folks (up to and including EY) are generally a lot more vague and imprecise when talking about statistics than professional statisticians using F for whatever reason. But still seem to have strong opinions about B over F. It’s kinda culty, to be honest.
      
      Here’s a book by a smart F:
      
      http://www.amazon.com/All-Statistics-Statistical-Inference-Springer/dp/0387402721
      
      The section on B stat is fairly funny.
      - [deleted] 31 Oct 2012 21:56 UTC
        3 points
        0
        Parent
        F techniques tend to make assumptions that are equivalent to establishing prior distributions, but because it’s easy to forget about these assumptions, many people use F techniques without considering what the assumptions mean. If you are explicit about establishing priors, however, this mostly evaporates.
        
        Notice that the point about your analogy was regarding area of application, not relative vagueness.
        
        I don’t have a strong personal opinion about F/B. This is just based on informal observations about F techniques versus B techniques.
        IlyaShpitser 31 Oct 2012 22:19 UTC
        2 points
        0
        Parent
        
        many people use F techniques without considering what the assumptions mean
        
        Can you name three examples of this happening?
        Eliezer Yudkowsky 1 Nov 2012 7:11 UTC
        5 points
        0
        Parent
        Here’s one: http://lesswrong.com/lw/f6o/original_research_on_less_wrong/7q1g
        Luke_A_Somers 5 Nov 2012 15:46 UTC
        3 points
        0
        Parent
        Every biology paper released based on a 5% P-value threshold without regard to the underlying plausibility of the connection. There are many effects where I wouldn’t take a 0.1% P-value to mean anything (see: kerfluffle over superluminal neutrinos), and some where I’d take a 10% P-value as a weak but notable degree of confirmation.
        [deleted] 31 Oct 2012 22:23 UTC
        −2 points
        0
        Parent
        I could, but I doubt anything would come of it. Forget about the off-hand vagueness remark; the analogy still fails.
        IlyaShpitser 31 Oct 2012 22:29 UTC
        2 points
        0
        Parent
        “Area of app” depends on granularity: “analysis of running time” (e.g. “how long will this take, I haven’t got all day”) is an area of app, but if we are willing to drill in we can talk about distributions on input vs worst case as separate areas of app. I don’t really see a qualitative difference here: sometimes F is more appropriate, sometimes not. It really depends on how much we know about the problem and how paranoid we are being. Just as with algorithms—sometimes input distributions are reasonable, sometimes not.
        
        Or if we are being theoretical statisticians, our intended target for techniques we are developing. I am not sympathetic to “but the unwashed masses don’t really understand, therefore” kind of arguments. Math techniques don’t care, it’s best to use what’s appropriate.
        
        edit: in fact, let the utility function u(.) be the running time of an algorithm A, and the prior over theta the input distribution for algorithm A inputs. Now consider what the expectation for F vs the expectation for B is computing. This is a degenerate statistical problem, of course, but this isn’t even an analogy, it’s an isomorphism.
      - Cyan 1 Nov 2012 15:55 UTC
        2 points
        0
        Parent
        
        The section on B stat is fairly funny.
        
        No doubt about it, Larry Wasserman* is a smart guy. Unfortunately, that section isn’t his finest work. The normal prior example compares apples and oranges as discussed here, and the normalizing constant paradox analysis is just wrong, as LW himself discusses here.
        
        * I’m just a teeny bit jealous that his initials are “LW”. How awesome would that be?
        What links here?
        Wei Dai's comment on Open Thread, January 1-15, 2013 by OpenThreadGuy (11 Jan 2013 21:59 UTC; 6 points)
- DaFranker 31 Oct 2012 19:39 UTC
  2 points
  0
  Parent
  Data point: One of our Montreal LW meetup members showed us a picture and description pulled from his Bayes stats/analysis class, and the picture shows kiosks with the hippy bayes person and the straight-suited old-and-set-in-his-ways corporate clone, along with the general idea that frequentist thinking is good for long-term verification and reliability tests, but that people who promote frequentism over bayes when both are just as good are Doing Something Wrong (AKA sneer at the other tribe).
  - gwern 31 Oct 2012 20:34 UTC
    5 points
    0
    Parent
    I don’t think anyone needs anecdotes that Bayesian approaches are more popular than ever before or are a bona fide approach; I’m interested in the precise claim that now a majority of grad students identify as Bayesians. That is the interest.
    - DaFranker 31 Oct 2012 21:23 UTC
      0 points
      0
      Parent
      Ah, sorry for misunderstanding and going off on a tangent.
- jsteinhardt 31 Oct 2012 23:00 UTC
  1 point
  0
  Parent
  I don’t have precise numbers but this is my experience after having worked with ML groups at Cambridge, MIT, and Stanford. The next most common thing after Bayesians would be neural nets people if I had to guess (I don’t know what you want to label those as). Note that as a Bayesian-leaning person I may have a biased sample.
  
  I suspect Berkeley might be more frequentist but am unsure.
  - gwern 1 Nov 2012 0:48 UTC
    0 points
    0
    Parent
    I see.