I have also been looking for comparisons between classical theory and SLT that make the deficiencies of the classical theories of learning clear, so thanks for putting this in one place.
However, I find the narrative of “classical theory relies on the number of parameters but SLT relies on something much smaller than that” to be a bit of a strawman towards the classical theory. VC theory already only depends on the number of behaviours induced by your model class as opposed to the number of parameters, for example, and is a central part of the classical theory of generalization. Its predictions still fail to explain generalization of neural networks but several other complexity measures have already been proposed.
I have also been looking for comparisons between classical theory and SLT that make the deficiencies of the classical theories of learning clear, so thanks for putting this in one place.
However, I find the narrative of “classical theory relies on the number of parameters but SLT relies on something much smaller than that” to be a bit of a strawman towards the classical theory. VC theory already only depends on the number of behaviours induced by your model class as opposed to the number of parameters, for example, and is a central part of the classical theory of generalization. Its predictions still fail to explain generalization of neural networks but several other complexity measures have already been proposed.