The summary is spot on! I would add that the compute overhang was not just due to scaling, but also due to 30 years of Moore’s law and NVidia starting to optimize their GPUs for DL workloads.
The rep range idea was to communicate that despite AlphaStar being much smaller than GPT as a model, the training costs of both were much closer due to the way AlphaStar was trained. Reading it now it does seem confusing.
I meant progress of research innovations. You are right though, from an application perspective the plethora of low hanging fruit will have a lot of positive effects on the world at large.
I’m not certain if “the fundamentals remain largely unchanged” necessarily implies “the near future will be very disappointing to anyone extrapolating from the past few years”, though. Yes, it’s true that if the recent results didn’t depend on improvements in fundamentals, then we can’t use the recent results to extrapolate further progress in fundamentals.
But on the other hand, if the recent results didn’t depend on fundamentals, then that implies that you can accomplish quite a lot without many improvements on fundamentals. This implies that if anyone managed just one advance on the fundamental side, then that could again allow for several years of continued improvement, and we wouldn’t need to see lots of fundamental advances to see a lot of improvement.
So while your argument reduces the probability of us seeing a lot of fundamental progress in the near future (making further impressive results less likely), it also implies that the amount of fundamental progress that is required is less than might otherwise be expected (making further impressive results more likely).
This point has also been made before: predictions of short-term stagnation without also simultaneously bumping back AGI timelines would appear to imply steep acceleration at some point, in order for the necessary amounts of progress to ‘fit’ in the later time periods.
The point I was trying to make is not that there weren’t fundamental advances in the past. There were decades of advances in fundamentals that rocketed forward development at an unsustainable pace. The effect of this can be seen with sheer amount of computation that is being used for SOTA models. I don’t forsee that same leap happening twice.
The summary is spot on! I would add that the compute overhang was not just due to scaling, but also due to 30 years of Moore’s law and NVidia starting to optimize their GPUs for DL workloads.
The rep range idea was to communicate that despite AlphaStar being much smaller than GPT as a model, the training costs of both were much closer due to the way AlphaStar was trained. Reading it now it does seem confusing.
I meant progress of research innovations. You are right though, from an application perspective the plethora of low hanging fruit will have a lot of positive effects on the world at large.
I’m not certain if “the fundamentals remain largely unchanged” necessarily implies “the near future will be very disappointing to anyone extrapolating from the past few years”, though. Yes, it’s true that if the recent results didn’t depend on improvements in fundamentals, then we can’t use the recent results to extrapolate further progress in fundamentals.
But on the other hand, if the recent results didn’t depend on fundamentals, then that implies that you can accomplish quite a lot without many improvements on fundamentals. This implies that if anyone managed just one advance on the fundamental side, then that could again allow for several years of continued improvement, and we wouldn’t need to see lots of fundamental advances to see a lot of improvement.
So while your argument reduces the probability of us seeing a lot of fundamental progress in the near future (making further impressive results less likely), it also implies that the amount of fundamental progress that is required is less than might otherwise be expected (making further impressive results more likely).
This point has also been made before: predictions of short-term stagnation without also simultaneously bumping back AGI timelines would appear to imply steep acceleration at some point, in order for the necessary amounts of progress to ‘fit’ in the later time periods.
The point I was trying to make is not that there weren’t fundamental advances in the past. There were decades of advances in fundamentals that rocketed forward development at an unsustainable pace. The effect of this can be seen with sheer amount of computation that is being used for SOTA models. I don’t forsee that same leap happening twice.