Kinda, but there won’t be enough natural text data at the higher end of this range (using 2028-2030 compute) to just scale pretraining on text (you’d need more than 1,000 trillion tokens with repetition, maybe 200-300 trillion unique tokens), something else would need to happen instead or you start losing efficiency and compute ends up being less useful than it would be if there was enough text.
The steps of scaling take a long time, so only late 2025 models get to be shaped compute optimally for 2024 levels of pretraining compute, and run on hardware announced and first available in the cloud in 2024. This is just 2 years from 2022, when GPT-4 was trained, and the first of two 10x-20x steps at the 2022-2026 pace of scaling, with a third step remaining somewhere beyond 2026 if we assume $100bn per year revenue for an AI company (at that time). With 2026 compute, there just might be enough text data (with repetition) to say that scaling of pretraining is still happening in a straightforward sense, which brings the change from original Mar 2023 GPT-4 to 100x-400x (for models that might come out in 2027).
But this 100x-400x is also a confusing point of comparison, since between 2023 and 2027 there was the introduction of RLVR scaling (and test time reasoning), and also all the improvements that come from working on a product (as opposed to a research prototype) for 4 years. Continual learning might be another change complicating this comparison that happens before 2027 (meaning it might be a significant change, which remains uncertain; that it’s coming in some form, at least as effective context extension, seems quite clear at this point).
Kinda, but there won’t be enough natural text data at the higher end of this range (using 2028-2030 compute) to just scale pretraining on text (you’d need more than 1,000 trillion tokens with repetition, maybe 200-300 trillion unique tokens), something else would need to happen instead or you start losing efficiency and compute ends up being less useful than it would be if there was enough text.
The steps of scaling take a long time, so only late 2025 models get to be shaped compute optimally for 2024 levels of pretraining compute, and run on hardware announced and first available in the cloud in 2024. This is just 2 years from 2022, when GPT-4 was trained, and the first of two 10x-20x steps at the 2022-2026 pace of scaling, with a third step remaining somewhere beyond 2026 if we assume $100bn per year revenue for an AI company (at that time). With 2026 compute, there just might be enough text data (with repetition) to say that scaling of pretraining is still happening in a straightforward sense, which brings the change from original Mar 2023 GPT-4 to 100x-400x (for models that might come out in 2027).
But this 100x-400x is also a confusing point of comparison, since between 2023 and 2027 there was the introduction of RLVR scaling (and test time reasoning), and also all the improvements that come from working on a product (as opposed to a research prototype) for 4 years. Continual learning might be another change complicating this comparison that happens before 2027 (meaning it might be a significant change, which remains uncertain; that it’s coming in some form, at least as effective context extension, seems quite clear at this point).