Except that we had Beren claim that SOTA algorithmic progress is mostly data progress. Which could also mean that the explosion may be based on architectures which have yet to be found, like the right version of neuralese. As far as I am aware, architectures like the Coconut paper by Meta or this paper on arxivforget everything once they output the token, meaning that they are unlikely to be the optimal architecture.
As someone who has written that post, I think the title nowadays over claims, and the only reason I chose the title was because of the fact that it was used originally.
I’d probably argue that it explains a non-trivial amount of progress, but nowadays I’d focus way more on compute being the main driver of AI progress in general.
And yes, new architectures/loss functions could change the situation, but the paper is evidence that we should expect these architectures to rely on a lot of compute to be more efficient.
Except that we had Beren claim that SOTA algorithmic progress is mostly data progress. Which could also mean that the explosion may be based on architectures which have yet to be found, like the right version of neuralese. As far as I am aware, architectures like the Coconut paper by Meta or this paper on arxiv forget everything once they output the token, meaning that they are unlikely to be the optimal architecture.
As someone who has written that post, I think the title nowadays over claims, and the only reason I chose the title was because of the fact that it was used originally.
I’d probably argue that it explains a non-trivial amount of progress, but nowadays I’d focus way more on compute being the main driver of AI progress in general.
And yes, new architectures/loss functions could change the situation, but the paper is evidence that we should expect these architectures to rely on a lot of compute to be more efficient.