Possibly controversial, but I think the biggest thing that is wrong with modern deep learning is that backpropagation is the wrong learning rule.
Reading Reiner Pope’s “How to Scale Your Model”, backpropagation triples compute cost compared to inference, which means that it is not economically feasible to deploy large models that learn online.
This is absurd! This cannot be the Master Learning Algorithm that the human brain uses to implement AGI at 20W power consumption.
I recently heard Ilya Sutskever say that his heuristic is to draw inspiration from the best understanding of how the human brain works, and use that as a “good taste“ heuristic as to what is likely to work. In this context, backpropagation is terrible taste, absolutely disgusting.
The next iteration of learning updates will most likely be lighter. Probably a modernization of Hebbian learning.
I’m confused by this response, so let’s some numbers on it.
Suppose you have enough compute to train a model with 2 trillion parameters with the conventional backpropagation algorithm. If you had a better algorithm that didn’t incur the memory overhead of backprop with its global update rules, you could use the same hardware to train a model triple the size, which is a 6T model.
Scaling laws tell us that we can reasonably expect this to be a much more capable model.