OK, so you’re saying that a big problem in model selection is coming up with good prior distributions for different classes of models, specifically those with different tail decays (it sounds like you think it could also be that the standard bayes framework is missing something). This is an interesting idea which I had heard about before, but didn’t understand till now. Thank you for telling me about it.
I would say that when you have a somewhat dispersed posterior it is simply misleading to pick any specific model+parameters as your fit. The correct thing to do is average over possible models+parameters.
It’s only when you have a relatively narrow posterior or the errors bars on the estimate you give for some parameter or prediction don’t matter that it’s OK to select a single model.
I think I basically agree with you on that; whenever feasible the full posterior (as opposed to the maximum-likelihood model) is what you should be using. So instead of using “Bayesian model selection” to decide whether to pick cubics or quadratics, and then fitting the best cubic or the best quadratic depending on the answer, the “right” thing to do is to just look at the posterior distribution over possible functions f, and use that to get a posterior distribution over f(x) for any given x.
The problem is that this is not always reasonable for the application you have in mind, and I’m not sure if we have good general methods for coming up with the right way to get a good approximation. But certainly an average over the models is what we should be trying to approximate.