Alphastar was trained by creating a league of AlphaStars which competed against each other in actual games. To continue our weightlifting analogy, this is like a higher rep range with lower weight.
I think by this point your weightlifting analogy has started to obscure much more than clarify. (Speaking as something who just came back from doing some higher rep exercises with lower weight, I struggle to see how that was in any sense like the AlphaStar League PBT training.)
I disagree with the claim that progress has slowed down but I am also not too sure what you are arguing since you are redefining ‘progress’ to mean something other than ‘quickly making way more powerful systems like AlphaFold or GPT-3’, which you do agree with. To rephrase this more like the past scaling discussions, I think you are arguing something along the lines of
Recent ‘AI progress’ in DL is unsustainable because it was due not to fundamentals but picking low-hanging fruits, the one-time using-up of a compute overhang: it was largely driven by relatively small innovations like the Transformer which unlocked scaling, combined with far more money spent on compute to achieve that scaling—as we see in the ‘AI And Compute’ trend. This trend broke around when it was documented, and will not resume: PaLM is about as large as it’ll get for the foreseeable future. The fundamentals remain largely unchanged, and if anything, improvement of those slowed recently as everyone was distracted picking the low-hanging fruits and applying them. Thus, the near future will be very disappointing to anyone extrapolating from the past few years, as we have returned to the regime where research ideas are the bottleneck, and not data/compute/money, and the necessary breakthrough research ideas will arrive unpredictably at their own pace.
The summary is spot on! I would add that the compute overhang was not just due to scaling, but also due to 30 years of Moore’s law and NVidia starting to optimize their GPUs for DL workloads.
The rep range idea was to communicate that despite AlphaStar being much smaller than GPT as a model, the training costs of both were much closer due to the way AlphaStar was trained. Reading it now it does seem confusing.
I meant progress of research innovations. You are right though, from an application perspective the plethora of low hanging fruit will have a lot of positive effects on the world at large.
I’m not certain if “the fundamentals remain largely unchanged” necessarily implies “the near future will be very disappointing to anyone extrapolating from the past few years”, though. Yes, it’s true that if the recent results didn’t depend on improvements in fundamentals, then we can’t use the recent results to extrapolate further progress in fundamentals.
But on the other hand, if the recent results didn’t depend on fundamentals, then that implies that you can accomplish quite a lot without many improvements on fundamentals. This implies that if anyone managed just one advance on the fundamental side, then that could again allow for several years of continued improvement, and we wouldn’t need to see lots of fundamental advances to see a lot of improvement.
So while your argument reduces the probability of us seeing a lot of fundamental progress in the near future (making further impressive results less likely), it also implies that the amount of fundamental progress that is required is less than might otherwise be expected (making further impressive results more likely).
This point has also been made before: predictions of short-term stagnation without also simultaneously bumping back AGI timelines would appear to imply steep acceleration at some point, in order for the necessary amounts of progress to ‘fit’ in the later time periods.
The point I was trying to make is not that there weren’t fundamental advances in the past. There were decades of advances in fundamentals that rocketed forward development at an unsustainable pace. The effect of this can be seen with sheer amount of computation that is being used for SOTA models. I don’t forsee that same leap happening twice.
I think by this point your weightlifting analogy has started to obscure much more than clarify. (Speaking as something who just came back from doing some higher rep exercises with lower weight, I struggle to see how that was in any sense like the AlphaStar League PBT training.)
I disagree with the claim that progress has slowed down but I am also not too sure what you are arguing since you are redefining ‘progress’ to mean something other than ‘quickly making way more powerful systems like AlphaFold or GPT-3’, which you do agree with. To rephrase this more like the past scaling discussions, I think you are arguing something along the lines of
The summary is spot on! I would add that the compute overhang was not just due to scaling, but also due to 30 years of Moore’s law and NVidia starting to optimize their GPUs for DL workloads.
The rep range idea was to communicate that despite AlphaStar being much smaller than GPT as a model, the training costs of both were much closer due to the way AlphaStar was trained. Reading it now it does seem confusing.
I meant progress of research innovations. You are right though, from an application perspective the plethora of low hanging fruit will have a lot of positive effects on the world at large.
I’m not certain if “the fundamentals remain largely unchanged” necessarily implies “the near future will be very disappointing to anyone extrapolating from the past few years”, though. Yes, it’s true that if the recent results didn’t depend on improvements in fundamentals, then we can’t use the recent results to extrapolate further progress in fundamentals.
But on the other hand, if the recent results didn’t depend on fundamentals, then that implies that you can accomplish quite a lot without many improvements on fundamentals. This implies that if anyone managed just one advance on the fundamental side, then that could again allow for several years of continued improvement, and we wouldn’t need to see lots of fundamental advances to see a lot of improvement.
So while your argument reduces the probability of us seeing a lot of fundamental progress in the near future (making further impressive results less likely), it also implies that the amount of fundamental progress that is required is less than might otherwise be expected (making further impressive results more likely).
This point has also been made before: predictions of short-term stagnation without also simultaneously bumping back AGI timelines would appear to imply steep acceleration at some point, in order for the necessary amounts of progress to ‘fit’ in the later time periods.
The point I was trying to make is not that there weren’t fundamental advances in the past. There were decades of advances in fundamentals that rocketed forward development at an unsustainable pace. The effect of this can be seen with sheer amount of computation that is being used for SOTA models. I don’t forsee that same leap happening twice.