In this case, SLT claims that the worst singularities dominate the equilibrium behavior of SGD, which I agree is an accurate claim. However, I’m not sure what this claim is supposed to tell us about how NNs learn
I think the implied claim is something like “analyzing the singularities of the model will also be helpful for understanding SGD in more realistic settings” or maybe just “investigating this area further will lead to insights which are applicable in more realistic settings”. I mostly don’t buy it myself.
I think the implied claim is something like “analyzing the singularities of the model will also be helpful for understanding SGD in more realistic settings” or maybe just “investigating this area further will lead to insights which are applicable in more realistic settings”. I mostly don’t buy it myself.