a- Pretraining scaling has slowed down massively since the GPT-4.5 debacle.
This is the one element of the comment that doesn’t really stand up, because new data centers that are much larger are being constructed, and the GPT-4.5 debacle was near entirely because we compared GPT 4.5 to o3 with RL, as well as compute being scaled 10x instead of 100x.
Pre-training is still going strong, it’s just rested a bit due to the crazy RL scaleup, and it will come back in importance (absent continual learning).
This implies that their trend up to 2030 is likely accurate, but post-2030 absent new paradigms will look a lot different than the median scenario in their model.
GPT-4 was pre-trained in 2022. GPT-4o was pre-trained in 2024. Since then, models likely have the same size. Clearly something is happening that no one wants to spend 100x more in a pre-train run. Likely because you need high-qualitt non-synthetic data.
This is the one element of the comment that doesn’t really stand up, because new data centers that are much larger are being constructed, and the GPT-4.5 debacle was near entirely because we compared GPT 4.5 to o3 with RL, as well as compute being scaled 10x instead of 100x.
Pre-training is still going strong, it’s just rested a bit due to the crazy RL scaleup, and it will come back in importance (absent continual learning).
This implies that their trend up to 2030 is likely accurate, but post-2030 absent new paradigms will look a lot different than the median scenario in their model.
GPT-4 was pre-trained in 2022. GPT-4o was pre-trained in 2024. Since then, models likely have the same size. Clearly something is happening that no one wants to spend 100x more in a pre-train run. Likely because you need high-qualitt non-synthetic data.