Thanks for the write-up. I read it as you arguing that most any prediction can be interpreted in the Bayesian framework which, I think is a weaker claim.
However, there are issues with treating it as the only right way for it leaves a number of important questions unanswered. For example, how do you pick the prior? How do you assemble your set of possible outcomes (=hypotheses)? What happens if your forecast influences the result?
I also think that being uncomputable is a bigger deal than you make it to be.
I think that the claim that any prediction can be interpreted in this minimal and consistent framework without exceptions whatsoever is a rather strong claim, I don’t think I want to claim much more than that (although I do want to add that if we have such a unique framework that is both minimal and complete when it comes to making predictions then that seems like a very natural choice for Statistics with a capital s).
I don’t think we’re going to agree about the importance of computability without more context. I agree that every time I try to build myself a nice Bayesian algorithm I run into the problem of uncomputability, but personally I consider Bayesian statistics to be more of a method of evaluating algorithms than a method for creating them (although Bayesian statistics is by no means limited to this!).
As for your other questions: important to note is that your issues are issues with Bayesian statistics as much as they are issues with any other form of prediction making. To pick a frequentist algorithm is to pick a prior with a set of hypotheses, i.e. to make Bayes’ Theorem computable and provide the unknowns on the r.h.s. above (as mentioned earlier you can in theory extract the prior and set of hypotheses from an algorithm by considering which outcome your algorithm would give when it saw a certain set of data, and then inverting Bayes’ Theorem to find the unknowns. At least, I think this is possible (it worked so far)). And indeed picking the prior and set of hypotheses is not an easy task—this is precisely what leads to different competing algorithms in the field of statistics.
To pick a frequentist algorithm is to pick a prior with a set of hypotheses, i.e. to make Bayes’ Theorem computable and provide the unknowns on the r.h.s. above (as mentioned earlier you can in theory extract the prior and set of hypotheses from an algorithm by considering which outcome your algorithm would give when it saw a certain set of data, and then inverting Bayes’ Theorem to find the unknowns.
Okay, this is the last thing I’ll say here until/unless you engage with the Robins and Wasserman post that IlyaShpitser and I have been suggesting you look at. You can indeed pick a prior and hypotheses (and I guess a way to go from posterior to point estimation, e.g., MAP, posterior mean, etc.) so that your Bayesian procedure does the same thing as your non-Bayesian procedure for any realization of the data. The problem is that in the Robins-Ritov example, your prior may need to depend on the data to do this! Mechanically, this is no problem; philosophically, you’re updating on the data twice and it’s hard to argue that doing this is unproblematic. In other situations, you may need to do other unsavory things with your prior. If the non-Bayesian procedure that works well looks like a Bayesian procedure that makes insane assumptions, why should we look to Bayesian as a foundation for statistics?
(I may be willing to bite the bullet of poor frequentist performance in some cases for philosophical purity, but I damn well want to make sure I understand what I’m giving up. It is supremely dishonest to pretend there’s no trade-off present in this situation. And a Bayes-first education doesn’t even give you the concepts to see what you gain and what you lose by being a Bayesian.)
the claim that any prediction can be interpreted in this minimal and consistent framework without exceptions whatsoever is a rather strong claim
The Bayes Rule by itself is not a framework. It’s just a particular statistical operation, useful no doubt, but hardly arising to the level of framework.
The claim that you can interpret any prediction as forecasting a particular probability distribution has nothing to do with Bayes. For example, let’s say that an analyst predicts the average growth in the GDP of China for the next five years to be 5%. If we dig and poke we can re-express this as a forecast of something like a normal distribution centered at 5% and with some width which corresponds to the expected error—so there is your forecast probability distribution. But is there a particular prior here? Any specific pieces of evidence on which the analyst updated the prior? Um, not really.
Thanks for the write-up. I read it as you arguing that most any prediction can be interpreted in the Bayesian framework which, I think is a weaker claim.
However, there are issues with treating it as the only right way for it leaves a number of important questions unanswered. For example, how do you pick the prior? How do you assemble your set of possible outcomes (=hypotheses)? What happens if your forecast influences the result?
I also think that being uncomputable is a bigger deal than you make it to be.
I think that the claim that any prediction can be interpreted in this minimal and consistent framework without exceptions whatsoever is a rather strong claim, I don’t think I want to claim much more than that (although I do want to add that if we have such a unique framework that is both minimal and complete when it comes to making predictions then that seems like a very natural choice for Statistics with a capital s).
I don’t think we’re going to agree about the importance of computability without more context. I agree that every time I try to build myself a nice Bayesian algorithm I run into the problem of uncomputability, but personally I consider Bayesian statistics to be more of a method of evaluating algorithms than a method for creating them (although Bayesian statistics is by no means limited to this!).
As for your other questions: important to note is that your issues are issues with Bayesian statistics as much as they are issues with any other form of prediction making. To pick a frequentist algorithm is to pick a prior with a set of hypotheses, i.e. to make Bayes’ Theorem computable and provide the unknowns on the r.h.s. above (as mentioned earlier you can in theory extract the prior and set of hypotheses from an algorithm by considering which outcome your algorithm would give when it saw a certain set of data, and then inverting Bayes’ Theorem to find the unknowns. At least, I think this is possible (it worked so far)). And indeed picking the prior and set of hypotheses is not an easy task—this is precisely what leads to different competing algorithms in the field of statistics.
Okay, this is the last thing I’ll say here until/unless you engage with the Robins and Wasserman post that IlyaShpitser and I have been suggesting you look at. You can indeed pick a prior and hypotheses (and I guess a way to go from posterior to point estimation, e.g., MAP, posterior mean, etc.) so that your Bayesian procedure does the same thing as your non-Bayesian procedure for any realization of the data. The problem is that in the Robins-Ritov example, your prior may need to depend on the data to do this! Mechanically, this is no problem; philosophically, you’re updating on the data twice and it’s hard to argue that doing this is unproblematic. In other situations, you may need to do other unsavory things with your prior. If the non-Bayesian procedure that works well looks like a Bayesian procedure that makes insane assumptions, why should we look to Bayesian as a foundation for statistics?
(I may be willing to bite the bullet of poor frequentist performance in some cases for philosophical purity, but I damn well want to make sure I understand what I’m giving up. It is supremely dishonest to pretend there’s no trade-off present in this situation. And a Bayes-first education doesn’t even give you the concepts to see what you gain and what you lose by being a Bayesian.)
The Bayes Rule by itself is not a framework. It’s just a particular statistical operation, useful no doubt, but hardly arising to the level of framework.
The claim that you can interpret any prediction as forecasting a particular probability distribution has nothing to do with Bayes. For example, let’s say that an analyst predicts the average growth in the GDP of China for the next five years to be 5%. If we dig and poke we can re-express this as a forecast of something like a normal distribution centered at 5% and with some width which corresponds to the expected error—so there is your forecast probability distribution. But is there a particular prior here? Any specific pieces of evidence on which the analyst updated the prior? Um, not really.