it was shown by Lee et al.
Also contemporaneously Alexander G. de G. Matthews et al.! And, while less famous, that paper was better in one way: it took the limit of the width of all layers simultaneously, instead of one by one. That is, Lee et al was a statement about:
lim(width->infty) [ b_2 + W_2 nonlinearity( lim(width → infty) [W_1x + b_1])]whereas Matthews et al was a statement about:lim(width->infty)[ b_2 + W_2 nonlinearity(W_1x + b_1)]which is more complicated
Good citation, that paper seems to have slipped my recollection (probably because it’s less famous, as you said). Added a footnote to clarity.
Also contemporaneously Alexander G. de G. Matthews et al.! And, while less famous, that paper was better in one way: it took the limit of the width of all layers simultaneously, instead of one by one. That is, Lee et al was a statement about:
lim(width->infty) [ b_2 + W_2 nonlinearity( lim(width → infty) [W_1x + b_1])]
whereas Matthews et al was a statement about:
lim(width->infty)[ b_2 + W_2 nonlinearity(W_1x + b_1)]
which is more complicated
Good citation, that paper seems to have slipped my recollection (probably because it’s less famous, as you said). Added a footnote to clarity.