Logan Zoellner comments on “Deep Learning” Is Function Approximation

Logan Zoellner 25 Mar 2024 13:34 UTC
5 points
2
but for sufficiently large function approximators, the trend reverses
Transformers/deep learning work because of built-in regularization methods (like dropout layers) and not because “the trend reverses”. If you did naive “best fit polynomial” with a 7 billion parameter polynomial you would not get a good result.