Benjamin Gerraty

Karma: 44

PhD Candidate at the University of Melbourne, working on singular learning theory.

Website: https://bengerraty.github.io/

Benjamin Gerraty 5 May 2026 10:51 UTC
9 points
4
in reply to: Dmitry Vaintrob’s comment on: Learning zero, and what SLT gets wrong about it
No worries, I’ll send you a DM!
Regarding your point:
I think this doesn’t change the fundamental issue though. The free energy here is bounded by until you reach on the order of at least .
I think I buy that without doing the maths. However as I understand it this argument relies on the fact that the RLCT grows like for large , is that correct? If so I want to push back on what you said here:
The choice of activation function and data distribution generalizes completely: so long as the activation is analytic, it’s relevant in terms of the details of exponential growth rather than its presence.
Consider a two layer linear network with hidden dimension given by , . The RLCT is if . Hitting it with a softmax function (see https://arxiv.org/abs/2501.12747 , Theorem 7) we have
which is independent of the width . So we can take as large as we like and it wont change the RLCT, and we can get a lower bound on the dataset size that is constant in the width, not exponential. Also if we had a ReLU activation function instead (same paper, Theorem 6) you can also get an RLCT that is independent of the width if , probably giving a similar bound.

Benjamin Gerraty 2 May 2026 1:49 UTC
37 points
15
on: Learning zero, and what SLT gets wrong about it
Thank you for taking the time to post this. I found it an interesting read, although I believe there is a problem with your example.
You make an argument to obtain an upper bound for the RLCT, obtaining . This seems fine to me, but you then appear to assume that this is the exact RLCT for the rest of the post. We actually know the RLCT and its multiplicity for your example here (see Aoyagi and Watanabe, Resolution of Singularities and the Generalization Error with Bayesian Estimation for Layered Neural Network, 2005). The RLCT is
where is the width of the network and is the largest positive integer satisfying . The multiplicity is if and if . So plugging in and we obtain
which is a lot less than or even . This makes your later point that when appear fairly unsurprising to me, given the true RLCT is nowhere near that large.