I agree that “frequentists do it wrong so often” is more because science is done by humans than due to any flaw in frequentist techniques. I also share your expectation that increased popularity of Bayesian techniques with more moving parts is likely to lead to more, not less, motivated selective use.
The online learning example, though, seems very unambitious in its loss function. Ideally you’d do better than any individual estimator in the underlying data. Let’s say I was trying to use answers on the SAT test to predict college GPAs. A method that is no less predictive than the best individual question can still be pretty bad. A simple aggregation like the count of correct answers will very likely do better.
There are lots of statistical techniques that sound almost magical in their robustness, until you notice that this comes at the price of not narrowing our uncertainty very much. (Although some frequentist statistical measures are not bad at this either).
The online learning example, though, seems very unambitious in its loss function. Ideally you’d do better than any individual estimator in the underlying data.
I think this is good intuition; I’ll just point out that the example I gave is much simpler than what you can actually ask for. For instance, if I want to be competitive with the best linear combination of at most k of the predictors, then I can do this with klog(n)/epsilon^2 rounds. If I want to be competitive with the best overall combination that uses all n predictors, I can do this with n/epsilon^2 rounds. The guarantees scale pretty gracefully with the problem instance.
(Another thing I’ll point out in passing is that you’re perfectly free to throw in “fraction of correct answers” as an additional predictor, although that doesn’t address the core of your point, though I think that the preceding paragraph does address it.)
I agree that “frequentists do it wrong so often” is more because science is done by humans than due to any flaw in frequentist techniques. I also share your expectation that increased popularity of Bayesian techniques with more moving parts is likely to lead to more, not less, motivated selective use.
Another important reason is that basic frequentist statistics is quite complicated and non-intuitive for the vast majority of even highly educated and mathematically literate people (engineers for example). Bayesian statistics is dramatically simpler and more intuitive on the basic level.
A practitioner who knows basic Bayesian statistics can easily invent new models and know how to solve them conceptually (though often not practically). A practitioner who knows basic frequentist statistics can not.
Great post! I learned a lot.
I agree that “frequentists do it wrong so often” is more because science is done by humans than due to any flaw in frequentist techniques. I also share your expectation that increased popularity of Bayesian techniques with more moving parts is likely to lead to more, not less, motivated selective use.
The online learning example, though, seems very unambitious in its loss function. Ideally you’d do better than any individual estimator in the underlying data. Let’s say I was trying to use answers on the SAT test to predict college GPAs. A method that is no less predictive than the best individual question can still be pretty bad. A simple aggregation like the count of correct answers will very likely do better.
There are lots of statistical techniques that sound almost magical in their robustness, until you notice that this comes at the price of not narrowing our uncertainty very much. (Although some frequentist statistical measures are not bad at this either).
Thanks! Glad you enjoyed it.
I think this is good intuition; I’ll just point out that the example I gave is much simpler than what you can actually ask for. For instance, if I want to be competitive with the best linear combination of at most k of the predictors, then I can do this with klog(n)/epsilon^2 rounds. If I want to be competitive with the best overall combination that uses all n predictors, I can do this with n/epsilon^2 rounds. The guarantees scale pretty gracefully with the problem instance.
(Another thing I’ll point out in passing is that you’re perfectly free to throw in “fraction of correct answers” as an additional predictor, although that doesn’t address the core of your point, though I think that the preceding paragraph does address it.)
Another important reason is that basic frequentist statistics is quite complicated and non-intuitive for the vast majority of even highly educated and mathematically literate people (engineers for example). Bayesian statistics is dramatically simpler and more intuitive on the basic level.
A practitioner who knows basic Bayesian statistics can easily invent new models and know how to solve them conceptually (though often not practically). A practitioner who knows basic frequentist statistics can not.