In fact, now that the year is 2012 the majority of new graduate students are being raised as Bayesians (at least in the U.S.) with frequentists thought of as stodgy emeritus professors stuck in their ways.
Is this actually true? Where would one get numbers on such a thing?
No, it’s not true. This whole F vs B thing is such a false choice too. Does it make sense in computational complexity to have a holy war between average case and worst case analysis of algorithm running time? Maybe for people who go on holy wars as a hobby, but not as a serious thing.
I don’t understand why this was linked as a response at all. Randomization is conjectured not to help in the sense that people think P = BPP. But there are cases where randomization does strictly help (wikipedia has a partial list: http://en.wikipedia.org/wiki/Randomized_algorithm).
My point was about sociology. Complexity theorists are not bashing each other’s heads in over whether worst case or average case analysis is “better,” they are proving theorems relating the approaches, with the understanding that in some algorithm analysis applications, it makes sense to take the “adversary view,” for example in real time systems that need strict guarantees. In other applications, typical running time is a more useful quantity. Nobody calls worst case analysis an apostate technique. Maybe that’s a good example to follow. Keep religion out of math, please.
I agree with this. That was supposed to be the point of the post.
Randomization is conjectured not to help in the sense that people think P = BPP.
Even if P = BPP, randomization still probably helps; P = BPP just means that randomization doesn’t help so much that it separates polynomial from non-polynomial.
Your analogy is imprecise. Average case and worst case analyses are both useful in their own right, and deal with different phenomena; F and B claim to deal with the same phenomena, but F is usually more vague about what assumptions its techniques follow from.
A more apt analogy, in my opinion, would be between interpretations of QM. All of them claim to deal with the same phenomena, but some interpretations are more vague about the precise mechanism than others.
Why do you think F is more vague than B? I don’t think that’s true. LW folks (up to and including EY) are generally a lot more vague and imprecise when talking about statistics than professional statisticians using F for whatever reason. But still seem to have strong opinions about B over F. It’s kinda culty, to be honest.
F techniques tend to make assumptions that are equivalent to establishing prior distributions, but because it’s easy to forget about these assumptions, many people use F techniques without considering what the assumptions mean. If you are explicit about establishing priors, however, this mostly evaporates.
Notice that the point about your analogy was regarding area of application, not relative vagueness.
I don’t have a strong personal opinion about F/B. This is just based on informal observations about F techniques versus B techniques.
Every biology paper released based on a 5% P-value threshold without regard to the underlying plausibility of the connection. There are many effects where I wouldn’t take a 0.1% P-value to mean anything (see: kerfluffle over superluminal neutrinos), and some where I’d take a 10% P-value as a weak but notable degree of confirmation.
“Area of app” depends on granularity: “analysis of running time” (e.g. “how long will this take, I haven’t got all day”) is an area of app, but if we are willing to drill in we can talk about distributions on input vs worst case as separate areas of app. I don’t really see a qualitative difference here: sometimes F is more appropriate, sometimes not. It really depends on how much we know about the problem and how paranoid we are being. Just as with algorithms—sometimes input distributions are reasonable, sometimes not.
Or if we are being theoretical statisticians, our intended target for techniques we are developing. I am not sympathetic to “but the unwashed masses don’t really understand, therefore” kind of arguments. Math techniques don’t care, it’s best to use what’s appropriate.
edit: in fact, let the utility function u(.) be the running time of an algorithm A, and the prior over theta the input distribution for algorithm A inputs. Now consider what the expectation for F vs the expectation for B is computing. This is a degenerate statistical problem, of course, but this isn’t even an analogy, it’s an isomorphism.
No doubt about it, Larry Wasserman* is a smart guy. Unfortunately, that section isn’t his finest work. The normal prior example compares apples and oranges as discussed here, and the normalizing constant paradox analysis is just wrong, as LW himself discusses here.
* I’m just a teeny bit jealous that his initials are “LW”. How awesome would that be?
Data point: One of our Montreal LW meetup members showed us a picture and description pulled from his Bayes stats/analysis class, and the picture shows kiosks with the hippy bayes person and the straight-suited old-and-set-in-his-ways corporate clone, along with the general idea that frequentist thinking is good for long-term verification and reliability tests, but that people who promote frequentism over bayes when both are just as good are Doing Something Wrong (AKA sneer at the other tribe).
I don’t think anyone needs anecdotes that Bayesian approaches are more popular than ever before or are a bona fide approach; I’m interested in the precise claim that now a majority of grad students identify as Bayesians. That is the interest.
I don’t have precise numbers but this is my experience after having worked with ML groups at Cambridge, MIT, and Stanford. The next most common thing after Bayesians would be neural nets people if I had to guess (I don’t know what you want to label those as). Note that as a Bayesian-leaning person I may have a biased sample.
I suspect Berkeley might be more frequentist but am unsure.
Is this actually true? Where would one get numbers on such a thing?
No, it’s not true. This whole F vs B thing is such a false choice too. Does it make sense in computational complexity to have a holy war between average case and worst case analysis of algorithm running time? Maybe for people who go on holy wars as a hobby, but not as a serious thing.
Er, yes?
I don’t understand why this was linked as a response at all. Randomization is conjectured not to help in the sense that people think P = BPP. But there are cases where randomization does strictly help (wikipedia has a partial list: http://en.wikipedia.org/wiki/Randomized_algorithm).
My point was about sociology. Complexity theorists are not bashing each other’s heads in over whether worst case or average case analysis is “better,” they are proving theorems relating the approaches, with the understanding that in some algorithm analysis applications, it makes sense to take the “adversary view,” for example in real time systems that need strict guarantees. In other applications, typical running time is a more useful quantity. Nobody calls worst case analysis an apostate technique. Maybe that’s a good example to follow. Keep religion out of math, please.
I agree with this. That was supposed to be the point of the post.
Even if P = BPP, randomization still probably helps; P = BPP just means that randomization doesn’t help so much that it separates polynomial from non-polynomial.
Your analogy is imprecise. Average case and worst case analyses are both useful in their own right, and deal with different phenomena; F and B claim to deal with the same phenomena, but F is usually more vague about what assumptions its techniques follow from.
A more apt analogy, in my opinion, would be between interpretations of QM. All of them claim to deal with the same phenomena, but some interpretations are more vague about the precise mechanism than others.
Why do you think F is more vague than B? I don’t think that’s true. LW folks (up to and including EY) are generally a lot more vague and imprecise when talking about statistics than professional statisticians using F for whatever reason. But still seem to have strong opinions about B over F. It’s kinda culty, to be honest.
Here’s a book by a smart F:
http://www.amazon.com/All-Statistics-Statistical-Inference-Springer/dp/0387402721
The section on B stat is fairly funny.
F techniques tend to make assumptions that are equivalent to establishing prior distributions, but because it’s easy to forget about these assumptions, many people use F techniques without considering what the assumptions mean. If you are explicit about establishing priors, however, this mostly evaporates.
Notice that the point about your analogy was regarding area of application, not relative vagueness.
I don’t have a strong personal opinion about F/B. This is just based on informal observations about F techniques versus B techniques.
Can you name three examples of this happening?
Here’s one: http://lesswrong.com/lw/f6o/original_research_on_less_wrong/7q1g
Every biology paper released based on a 5% P-value threshold without regard to the underlying plausibility of the connection. There are many effects where I wouldn’t take a 0.1% P-value to mean anything (see: kerfluffle over superluminal neutrinos), and some where I’d take a 10% P-value as a weak but notable degree of confirmation.
I could, but I doubt anything would come of it. Forget about the off-hand vagueness remark; the analogy still fails.
“Area of app” depends on granularity: “analysis of running time” (e.g. “how long will this take, I haven’t got all day”) is an area of app, but if we are willing to drill in we can talk about distributions on input vs worst case as separate areas of app. I don’t really see a qualitative difference here: sometimes F is more appropriate, sometimes not. It really depends on how much we know about the problem and how paranoid we are being. Just as with algorithms—sometimes input distributions are reasonable, sometimes not.
Or if we are being theoretical statisticians, our intended target for techniques we are developing. I am not sympathetic to “but the unwashed masses don’t really understand, therefore” kind of arguments. Math techniques don’t care, it’s best to use what’s appropriate.
edit: in fact, let the utility function u(.) be the running time of an algorithm A, and the prior over theta the input distribution for algorithm A inputs. Now consider what the expectation for F vs the expectation for B is computing. This is a degenerate statistical problem, of course, but this isn’t even an analogy, it’s an isomorphism.
No doubt about it, Larry Wasserman* is a smart guy. Unfortunately, that section isn’t his finest work. The normal prior example compares apples and oranges as discussed here, and the normalizing constant paradox analysis is just wrong, as LW himself discusses here.
* I’m just a teeny bit jealous that his initials are “LW”. How awesome would that be?
Data point: One of our Montreal LW meetup members showed us a picture and description pulled from his Bayes stats/analysis class, and the picture shows kiosks with the hippy bayes person and the straight-suited old-and-set-in-his-ways corporate clone, along with the general idea that frequentist thinking is good for long-term verification and reliability tests, but that people who promote frequentism over bayes when both are just as good are Doing Something Wrong (AKA sneer at the other tribe).
I don’t think anyone needs anecdotes that Bayesian approaches are more popular than ever before or are a bona fide approach; I’m interested in the precise claim that now a majority of grad students identify as Bayesians. That is the interest.
Ah, sorry for misunderstanding and going off on a tangent.
I don’t have precise numbers but this is my experience after having worked with ML groups at Cambridge, MIT, and Stanford. The next most common thing after Bayesians would be neural nets people if I had to guess (I don’t know what you want to label those as). Note that as a Bayesian-leaning person I may have a biased sample.
I suspect Berkeley might be more frequentist but am unsure.
I see.