But “models have singularities and thus number of parameters is not a good complexity measure” is not a valid criticism of VC theory.
Right, this quote is really a criticism of the classical Bayesian Information Criterion (for which the “Widely applicable Bayesian Information Criterion” WBIC is the relevant SLT generalization).
Ah, I didn’t realize earlier that this was the goal. Are there any theorems that use SLT to quantify out-of-distribution generalization? The SLT papers I have read so far seem to still be talking about in-distribution generalization, with the added comment that Bayesian learning/SGD is more likely to give us “simpler” models and simpler models generalize better.
That’s right: existing work is about in-distribution generalization. It is the case that, within the Bayesian setting, SLT provides an essentially complete account of in-distribution generalization. As you’ve pointed out there are remaining differences between Bayes and SGD. We’re working on applications to OOD but have not put anything out publicly about this yet.
Right, this quote is really a criticism of the classical Bayesian Information Criterion (for which the “Widely applicable Bayesian Information Criterion” WBIC is the relevant SLT generalization).
That’s right: existing work is about in-distribution generalization. It is the case that, within the Bayesian setting, SLT provides an essentially complete account of in-distribution generalization. As you’ve pointed out there are remaining differences between Bayes and SGD. We’re working on applications to OOD but have not put anything out publicly about this yet.