No, it’s not true. This whole F vs B thing is such a false choice too. Does it make sense in computational complexity to have a holy war between average case and worst case analysis of algorithm running time? Maybe for people who go on holy wars as a hobby, but not as a serious thing.
I don’t understand why this was linked as a response at all. Randomization is conjectured not to help in the sense that people think P = BPP. But there are cases where randomization does strictly help (wikipedia has a partial list: http://en.wikipedia.org/wiki/Randomized_algorithm).
My point was about sociology. Complexity theorists are not bashing each other’s heads in over whether worst case or average case analysis is “better,” they are proving theorems relating the approaches, with the understanding that in some algorithm analysis applications, it makes sense to take the “adversary view,” for example in real time systems that need strict guarantees. In other applications, typical running time is a more useful quantity. Nobody calls worst case analysis an apostate technique. Maybe that’s a good example to follow. Keep religion out of math, please.
I agree with this. That was supposed to be the point of the post.
Randomization is conjectured not to help in the sense that people think P = BPP.
Even if P = BPP, randomization still probably helps; P = BPP just means that randomization doesn’t help so much that it separates polynomial from non-polynomial.
Your analogy is imprecise. Average case and worst case analyses are both useful in their own right, and deal with different phenomena; F and B claim to deal with the same phenomena, but F is usually more vague about what assumptions its techniques follow from.
A more apt analogy, in my opinion, would be between interpretations of QM. All of them claim to deal with the same phenomena, but some interpretations are more vague about the precise mechanism than others.
Why do you think F is more vague than B? I don’t think that’s true. LW folks (up to and including EY) are generally a lot more vague and imprecise when talking about statistics than professional statisticians using F for whatever reason. But still seem to have strong opinions about B over F. It’s kinda culty, to be honest.
F techniques tend to make assumptions that are equivalent to establishing prior distributions, but because it’s easy to forget about these assumptions, many people use F techniques without considering what the assumptions mean. If you are explicit about establishing priors, however, this mostly evaporates.
Notice that the point about your analogy was regarding area of application, not relative vagueness.
I don’t have a strong personal opinion about F/B. This is just based on informal observations about F techniques versus B techniques.
Every biology paper released based on a 5% P-value threshold without regard to the underlying plausibility of the connection. There are many effects where I wouldn’t take a 0.1% P-value to mean anything (see: kerfluffle over superluminal neutrinos), and some where I’d take a 10% P-value as a weak but notable degree of confirmation.
“Area of app” depends on granularity: “analysis of running time” (e.g. “how long will this take, I haven’t got all day”) is an area of app, but if we are willing to drill in we can talk about distributions on input vs worst case as separate areas of app. I don’t really see a qualitative difference here: sometimes F is more appropriate, sometimes not. It really depends on how much we know about the problem and how paranoid we are being. Just as with algorithms—sometimes input distributions are reasonable, sometimes not.
Or if we are being theoretical statisticians, our intended target for techniques we are developing. I am not sympathetic to “but the unwashed masses don’t really understand, therefore” kind of arguments. Math techniques don’t care, it’s best to use what’s appropriate.
edit: in fact, let the utility function u(.) be the running time of an algorithm A, and the prior over theta the input distribution for algorithm A inputs. Now consider what the expectation for F vs the expectation for B is computing. This is a degenerate statistical problem, of course, but this isn’t even an analogy, it’s an isomorphism.
No doubt about it, Larry Wasserman* is a smart guy. Unfortunately, that section isn’t his finest work. The normal prior example compares apples and oranges as discussed here, and the normalizing constant paradox analysis is just wrong, as LW himself discusses here.
* I’m just a teeny bit jealous that his initials are “LW”. How awesome would that be?
No, it’s not true. This whole F vs B thing is such a false choice too. Does it make sense in computational complexity to have a holy war between average case and worst case analysis of algorithm running time? Maybe for people who go on holy wars as a hobby, but not as a serious thing.
Er, yes?
I don’t understand why this was linked as a response at all. Randomization is conjectured not to help in the sense that people think P = BPP. But there are cases where randomization does strictly help (wikipedia has a partial list: http://en.wikipedia.org/wiki/Randomized_algorithm).
My point was about sociology. Complexity theorists are not bashing each other’s heads in over whether worst case or average case analysis is “better,” they are proving theorems relating the approaches, with the understanding that in some algorithm analysis applications, it makes sense to take the “adversary view,” for example in real time systems that need strict guarantees. In other applications, typical running time is a more useful quantity. Nobody calls worst case analysis an apostate technique. Maybe that’s a good example to follow. Keep religion out of math, please.
I agree with this. That was supposed to be the point of the post.
Even if P = BPP, randomization still probably helps; P = BPP just means that randomization doesn’t help so much that it separates polynomial from non-polynomial.
Your analogy is imprecise. Average case and worst case analyses are both useful in their own right, and deal with different phenomena; F and B claim to deal with the same phenomena, but F is usually more vague about what assumptions its techniques follow from.
A more apt analogy, in my opinion, would be between interpretations of QM. All of them claim to deal with the same phenomena, but some interpretations are more vague about the precise mechanism than others.
Why do you think F is more vague than B? I don’t think that’s true. LW folks (up to and including EY) are generally a lot more vague and imprecise when talking about statistics than professional statisticians using F for whatever reason. But still seem to have strong opinions about B over F. It’s kinda culty, to be honest.
Here’s a book by a smart F:
http://www.amazon.com/All-Statistics-Statistical-Inference-Springer/dp/0387402721
The section on B stat is fairly funny.
F techniques tend to make assumptions that are equivalent to establishing prior distributions, but because it’s easy to forget about these assumptions, many people use F techniques without considering what the assumptions mean. If you are explicit about establishing priors, however, this mostly evaporates.
Notice that the point about your analogy was regarding area of application, not relative vagueness.
I don’t have a strong personal opinion about F/B. This is just based on informal observations about F techniques versus B techniques.
Can you name three examples of this happening?
Here’s one: http://lesswrong.com/lw/f6o/original_research_on_less_wrong/7q1g
Every biology paper released based on a 5% P-value threshold without regard to the underlying plausibility of the connection. There are many effects where I wouldn’t take a 0.1% P-value to mean anything (see: kerfluffle over superluminal neutrinos), and some where I’d take a 10% P-value as a weak but notable degree of confirmation.
I could, but I doubt anything would come of it. Forget about the off-hand vagueness remark; the analogy still fails.
“Area of app” depends on granularity: “analysis of running time” (e.g. “how long will this take, I haven’t got all day”) is an area of app, but if we are willing to drill in we can talk about distributions on input vs worst case as separate areas of app. I don’t really see a qualitative difference here: sometimes F is more appropriate, sometimes not. It really depends on how much we know about the problem and how paranoid we are being. Just as with algorithms—sometimes input distributions are reasonable, sometimes not.
Or if we are being theoretical statisticians, our intended target for techniques we are developing. I am not sympathetic to “but the unwashed masses don’t really understand, therefore” kind of arguments. Math techniques don’t care, it’s best to use what’s appropriate.
edit: in fact, let the utility function u(.) be the running time of an algorithm A, and the prior over theta the input distribution for algorithm A inputs. Now consider what the expectation for F vs the expectation for B is computing. This is a degenerate statistical problem, of course, but this isn’t even an analogy, it’s an isomorphism.
No doubt about it, Larry Wasserman* is a smart guy. Unfortunately, that section isn’t his finest work. The normal prior example compares apples and oranges as discussed here, and the normalizing constant paradox analysis is just wrong, as LW himself discusses here.
* I’m just a teeny bit jealous that his initials are “LW”. How awesome would that be?