Thank you for this post! From a statistics (rather than computer science) background, I have encountered similar discussions in the context of Bayesian model averaging, and in particular I would recommend this publication if you don’t already know about it:
One of the main limitations they note about Bayes factors, the classic type of Bayesian model averaging, is that they are sensitive to how vague your initial priors were for the adjustable parameters of your competing models, so I’m not sure how much it applies to your example. It depends whether or not you think of your competing hyptoheses as having free parameters to estimate before making the comparison. (The same point about Bayes factors evaluating your initial priors is also made here on Gelman’s blog: https://statmodeling.stat.columbia.edu/2023/10/14/bayes-factors-prior-cross-validation-posterior/)
That said, the stacking paper has a broader message in my view. What they are saying is: “If you want to use a weighted average of different models for prediction, why not directly optimize the weights for minimal (validation) loss?”
Thank you for this post! From a statistics (rather than computer science) background, I have encountered similar discussions in the context of Bayesian model averaging, and in particular I would recommend this publication if you don’t already know about it:
“Using Stacking to Average Bayesian Predictive Distributions”
https://sites.stat.columbia.edu/gelman/research/published/stacking_paper_discussion_rejoinder.pdf
One of the main limitations they note about Bayes factors, the classic type of Bayesian model averaging, is that they are sensitive to how vague your initial priors were for the adjustable parameters of your competing models, so I’m not sure how much it applies to your example. It depends whether or not you think of your competing hyptoheses as having free parameters to estimate before making the comparison. (The same point about Bayes factors evaluating your initial priors is also made here on Gelman’s blog: https://statmodeling.stat.columbia.edu/2023/10/14/bayes-factors-prior-cross-validation-posterior/)
That said, the stacking paper has a broader message in my view. What they are saying is: “If you want to use a weighted average of different models for prediction, why not directly optimize the weights for minimal (validation) loss?”