IlyaShpitser comments on Beyond Bayesians and Frequentists

IlyaShpitser 31 Oct 2012 19:24 UTC
7 points
0
No, it’s not true. This whole F vs B thing is such a false choice too. Does it make sense in computational complexity to have a holy war between average case and worst case analysis of algorithm running time? Maybe for people who go on holy wars as a hobby, but not as a serious thing.
- Eliezer Yudkowsky 1 Nov 2012 7:00 UTC
  2 points
  0
  Parent
  
  Does it make sense in computational complexity to have a holy war between average case and worst case analysis of algorithm running time?
  
  Er, yes?
  - IlyaShpitser 1 Nov 2012 21:08 UTC
    19 points
    0
    Parent
    I don’t understand why this was linked as a response at all. Randomization is conjectured not to help in the sense that people think P = BPP. But there are cases where randomization does strictly help (wikipedia has a partial list: http://en.wikipedia.org/wiki/Randomized_algorithm).
    
    My point was about sociology. Complexity theorists are not bashing each other’s heads in over whether worst case or average case analysis is “better,” they are proving theorems relating the approaches, with the understanding that in some algorithm analysis applications, it makes sense to take the “adversary view,” for example in real time systems that need strict guarantees. In other applications, typical running time is a more useful quantity. Nobody calls worst case analysis an apostate technique. Maybe that’s a good example to follow. Keep religion out of math, please.
    - jsteinhardt 2 Nov 2012 7:24 UTC
      12 points
      0
      Parent
      
      Keep religion out of math, please.
      
      I agree with this. That was supposed to be the point of the post.
      
      Randomization is conjectured not to help in the sense that people think P = BPP.
      
      Even if P = BPP, randomization still probably helps; P = BPP just means that randomization doesn’t help so much that it separates polynomial from non-polynomial.
- [deleted] 31 Oct 2012 20:53 UTC
  0 points
  0
  Parent
  Your analogy is imprecise. Average case and worst case analyses are both useful in their own right, and deal with different phenomena; F and B claim to deal with the same phenomena, but F is usually more vague about what assumptions its techniques follow from.
  
  A more apt analogy, in my opinion, would be between interpretations of QM. All of them claim to deal with the same phenomena, but some interpretations are more vague about the precise mechanism than others.
  - IlyaShpitser 31 Oct 2012 21:08 UTC
    5 points
    0
    Parent
    Why do you think F is more vague than B? I don’t think that’s true. LW folks (up to and including EY) are generally a lot more vague and imprecise when talking about statistics than professional statisticians using F for whatever reason. But still seem to have strong opinions about B over F. It’s kinda culty, to be honest.
    
    Here’s a book by a smart F:
    
    http://www.amazon.com/All-Statistics-Statistical-Inference-Springer/dp/0387402721
    
    The section on B stat is fairly funny.
    - [deleted] 31 Oct 2012 21:56 UTC
      3 points
      0
      Parent
      F techniques tend to make assumptions that are equivalent to establishing prior distributions, but because it’s easy to forget about these assumptions, many people use F techniques without considering what the assumptions mean. If you are explicit about establishing priors, however, this mostly evaporates.
      
      Notice that the point about your analogy was regarding area of application, not relative vagueness.
      
      I don’t have a strong personal opinion about F/B. This is just based on informal observations about F techniques versus B techniques.
      - IlyaShpitser 31 Oct 2012 22:19 UTC
        2 points
        0
        Parent
        
        many people use F techniques without considering what the assumptions mean
        
        Can you name three examples of this happening?
        Eliezer Yudkowsky 1 Nov 2012 7:11 UTC
        5 points
        0
        Parent
        Here’s one: http://lesswrong.com/lw/f6o/original_research_on_less_wrong/7q1g
        Luke_A_Somers 5 Nov 2012 15:46 UTC
        3 points
        0
        Parent
        Every biology paper released based on a 5% P-value threshold without regard to the underlying plausibility of the connection. There are many effects where I wouldn’t take a 0.1% P-value to mean anything (see: kerfluffle over superluminal neutrinos), and some where I’d take a 10% P-value as a weak but notable degree of confirmation.
        [deleted] 31 Oct 2012 22:23 UTC
        −2 points
        0
        Parent
        I could, but I doubt anything would come of it. Forget about the off-hand vagueness remark; the analogy still fails.
        IlyaShpitser 31 Oct 2012 22:29 UTC
        2 points
        0
        Parent
        “Area of app” depends on granularity: “analysis of running time” (e.g. “how long will this take, I haven’t got all day”) is an area of app, but if we are willing to drill in we can talk about distributions on input vs worst case as separate areas of app. I don’t really see a qualitative difference here: sometimes F is more appropriate, sometimes not. It really depends on how much we know about the problem and how paranoid we are being. Just as with algorithms—sometimes input distributions are reasonable, sometimes not.
        
        Or if we are being theoretical statisticians, our intended target for techniques we are developing. I am not sympathetic to “but the unwashed masses don’t really understand, therefore” kind of arguments. Math techniques don’t care, it’s best to use what’s appropriate.
        
        edit: in fact, let the utility function u(.) be the running time of an algorithm A, and the prior over theta the input distribution for algorithm A inputs. Now consider what the expectation for F vs the expectation for B is computing. This is a degenerate statistical problem, of course, but this isn’t even an analogy, it’s an isomorphism.
    - Cyan 1 Nov 2012 15:55 UTC
      2 points
      0
      Parent
      
      The section on B stat is fairly funny.
      
      No doubt about it, Larry Wasserman* is a smart guy. Unfortunately, that section isn’t his finest work. The normal prior example compares apples and oranges as discussed here, and the normalizing constant paradox analysis is just wrong, as LW himself discusses here.
      
      * I’m just a teeny bit jealous that his initials are “LW”. How awesome would that be?
      What links here?
      Wei Dai's comment on Open Thread, January 1-15, 2013 by OpenThreadGuy (11 Jan 2013 21:59 UTC; 6 points)