LawrenceC comments on The paper that killed deep learning theory

LawrenceC 26 Apr 2026 7:31 UTC
9 points
1
Some combination of:
- The training procedure for random labels was much, much harder, e.g. 100x more steps (such that the x-axis had to be in a log scale to show it on the same graph as the true label case, a la original grokking/induction head results)
- Neural networks couldn’t fit random labels at all, at least on the scale of the datasets they could generalize on.