Ege Erdil comments on My impression of singular learning theory

Ege Erdil 18 Jun 2023 16:58 UTC
3 points
3

I think this is a very nice way to present the key ideas. However, in practice I think the discretisation is actually harder to reason about than the continuous version. There are deeper problems, but I’d start by wondering how you would ever compute c(f) defined this way, since it seems to depend in an intricate way on the details of e.g. the floating point implementation.

I would say that the discretization is going to be easier for people with a computer science background to grasp, even though formally I agree it’s going to be less pleasant to reason about or to do computations with. Still, if properties of NNs that only appeared when they are continuous functions on $R^{n}$ were essential for their generalization, we might be in trouble as people keep lowering the precision of their floating point numbers. This explanation makes it clear that while assuming NNs are continuous (or even analytic!) might be useful for theoretical purposes, the claims about generalization hold just as well in a more realistic discrete setting.

I’ll note that the volume codimension definition of the RLCT is essentially what you have written down here, and you don’t need any mathematics beyond calculus to write that down. You only need things like resolutions of singularities if you actually want to compute that value, and the discretisation doesn’t seem to offer any advantage there.

Yes, my definition is inspired by the volume codimension definition, though here we don’t need to take a limit as some $ε \to 0$ because the counting measure makes our life easy. The problem you have in a smooth setting is that descending the Lebesgue measure in a dumb way to subspaces with positive codimension gives trivial results, so more care is necessary to recover and reason about the appropriate notions of volume.