Pre training was easy to scale in 22, 23 and 24. There was excess capacity. Mythos is likely the first >10b pertained model. The Claude4-4.6 paradigm was likely driven by one pre trained model with RLVF on top. Mythos is the new class of pre trained model and scaling and doubling times will be based on the speed of building RL models on top of Mythos.
I agree that Anthropic will attempt to add more RL tasks and potentially update Mythos’s weights even after the pre-train, and this could effect doubling times, but my point was that Mythos suggests that you can in fact just scale upwards in parameter and pre-training compute multiple times, and that the memes of compute/pre-training scaling being dead weren’t correct at all.
Indeed, 500x the compute of GPT-4 to train a GPT-6-level model by pre-training is probably possible by 2028 if you were willing to avoid using RL (though when it’s deployed it’s more likely to be in 2029 due to RL and inference soaking up compute), and if the scale-up of AI in 2026 is as potent as people believe (and indeed this is likely to happen once the nerfed models release sometime this year), AI companies can get enough revenue to comfortably build GPT-7, which would have 5000x the compute of GPT-4, which I suspect would be built in 2031.
But there is something I want to say here, and that is how the shift to ever larger post-training/RL could enable us to incrementally solve continual learning/learning on the job/continual weight updates like brains. One of the gifts of AI 2027is that it points out in the January 2027 section that modulo some very important details like catastrophic forgetting of earlier tasks when later tasks are RLed in, fast enough weight updates for a long enough time via perpetually adding in more RL tasks is essentially equivalent to human-level continual learning, and while Agent-2 isn’t quite there yet, it is on the path, and it’s already able to continuously learn from the world all the time with it’s weights, it’s just slower than humans at this:
With Agent-1’s help, OpenBrain is now post-training Agent-2. More than ever, the focus is on high-quality data. Copious amounts of synthetic data are produced, evaluated, and filtered for quality before being fed to Agent-2.42 On top of this, they pay billions of dollars for human laborers to record themselves solving long-horizon tasks.43 On top of all that, they train Agent-2 almost continuously using reinforcement learning on an ever-expanding suite of diverse difficult tasks: lots of video games, lots of coding challenges, lots of research tasks. Agent-2, more so than previous models, is effectively “online learning,” in that it’s built to never really finish training. Every day, the weights get updated to the latest version, trained on more data generated by the previous version the previous day.
I expect something like this to ultimately be done once continual learning is targeted, albeit at a slower pace than what AI 2027 describes.
I think you’re missing what he’s saying here.
Pre training was easy to scale in 22, 23 and 24. There was excess capacity. Mythos is likely the first >10b pertained model. The Claude4-4.6 paradigm was likely driven by one pre trained model with RLVF on top. Mythos is the new class of pre trained model and scaling and doubling times will be based on the speed of building RL models on top of Mythos.
I agree that Anthropic will attempt to add more RL tasks and potentially update Mythos’s weights even after the pre-train, and this could effect doubling times, but my point was that Mythos suggests that you can in fact just scale upwards in parameter and pre-training compute multiple times, and that the memes of compute/pre-training scaling being dead weren’t correct at all.
Indeed, 500x the compute of GPT-4 to train a GPT-6-level model by pre-training is probably possible by 2028 if you were willing to avoid using RL (though when it’s deployed it’s more likely to be in 2029 due to RL and inference soaking up compute), and if the scale-up of AI in 2026 is as potent as people believe (and indeed this is likely to happen once the nerfed models release sometime this year), AI companies can get enough revenue to comfortably build GPT-7, which would have 5000x the compute of GPT-4, which I suspect would be built in 2031.
But there is something I want to say here, and that is how the shift to ever larger post-training/RL could enable us to incrementally solve continual learning/learning on the job/continual weight updates like brains. One of the gifts of AI 2027 is that it points out in the January 2027 section that modulo some very important details like catastrophic forgetting of earlier tasks when later tasks are RLed in, fast enough weight updates for a long enough time via perpetually adding in more RL tasks is essentially equivalent to human-level continual learning, and while Agent-2 isn’t quite there yet, it is on the path, and it’s already able to continuously learn from the world all the time with it’s weights, it’s just slower than humans at this:
I expect something like this to ultimately be done once continual learning is targeted, albeit at a slower pace than what AI 2027 describes.
What are your timelines? Curious because there are rumors ‘GPT-6’ releases this year