I think that Chinchilla provides a useful perspective for thinking about neural networks, it certainly turned my understanding on its head when it was published, but it is not the be-all-and-end-all of understanding neural network scaling.
The Chinchilla scaling laws are fairly specific to the supervised/self-supervised learning setup. As you mentioned, the key insight is that with a finite dataset, there’s a point where adding more parameters doesn’t help because you’ve extracted all the learnable signal from the data, or vice versa.
However, RL breaks that fixed-dataset assumption. For example, on-policy methods have a constantly shifting data distribution, so the concept of “dataset size” doesn’t really apply.
There certainly are scaling laws for RL, they just aren’t the ones presented in the Chinchilla paper. The intuition that compute allocation matters and different resources can bottleneck each other carries over, but the specifics can differ quite significantly.
And then there are evolutionary methods.
Personally, I find that the “parameters as pixels” analogy captures a more general intuition.
I think that Chinchilla provides a useful perspective for thinking about neural networks, it certainly turned my understanding on its head when it was published, but it is not the be-all-and-end-all of understanding neural network scaling.
The Chinchilla scaling laws are fairly specific to the supervised/self-supervised learning setup. As you mentioned, the key insight is that with a finite dataset, there’s a point where adding more parameters doesn’t help because you’ve extracted all the learnable signal from the data, or vice versa.
However, RL breaks that fixed-dataset assumption. For example, on-policy methods have a constantly shifting data distribution, so the concept of “dataset size” doesn’t really apply.
There certainly are scaling laws for RL, they just aren’t the ones presented in the Chinchilla paper. The intuition that compute allocation matters and different resources can bottleneck each other carries over, but the specifics can differ quite significantly.
And then there are evolutionary methods.
Personally, I find that the “parameters as pixels” analogy captures a more general intuition.