PhD Candidate at the University of Melbourne, working on singular learning theory.
Website: https://bengerraty.github.io/
PhD Candidate at the University of Melbourne, working on singular learning theory.
Website: https://bengerraty.github.io/
Thank you for taking the time to post this. I found it an interesting read, although I believe there is a problem with your example.
You make an argument to obtain an upper bound for the RLCT, obtaining
where
which is a lot less than
No worries, I’ll send you a DM!
Regarding your point:
I think I buy that without doing the maths. However as I understand it this argument relies on the fact that the RLCT grows like for large , is that correct? If so I want to push back on what you said here:
Consider a two layer linear network with hidden dimension given by , . The RLCT is if . Hitting it with a softmax function (see https://arxiv.org/abs/2501.12747 , Theorem 7) we have
which is independent of the width . So we can take as large as we like and it wont change the RLCT, and we can get a lower bound on the dataset size that is constant in the width, not exponential. Also if we had a ReLU activation function instead (same paper, Theorem 6) you can also get an RLCT that is independent of the width if , probably giving a similar bound.