Adrià Garriga-alonso comments on Maybe I was too harsh on deep learning theory (three days ago)

Adrià Garriga-alonso 11 May 2026 22:14 UTC
2 points
0
it was shown by Lee et al.
Also contemporaneously Alexander G. de G. Matthews et al.! And, while less famous, that paper was better in one way: it took the limit of the width of all layers simultaneously, instead of one by one. That is, Lee et al was a statement about:
lim(width->infty) [ b_2 + W_2 nonlinearity( lim(width → infty) [W_1x + b_1])]

whereas Matthews et al was a statement about:

lim(width->infty)[ b_2 + W_2 nonlinearity(W_1x + b_1)]

which is more complicated
What links here?
- Maybe I was too harsh on deep learning theory (three days ago) by LawrenceC (30 Apr 2026 6:57 UTC; 111 points)
- LawrenceC 12 May 2026 1:11 UTC
  2 points
  0
  Parent
  Good citation, that paper seems to have slipped my recollection (probably because it’s less famous, as you said). Added a footnote to clarity.