Claude Opus 4.6 had a 80% time horizon of 70 minutes. Assuming Mythos has an 80% TH of ~240 min, the doubling time is ~34-40 days. Even if we’re pessimistic at a time horizon of 180 minutes, the doubling time is still 45 days. The thing we’re forecasting is now shorter than our update cycle.
How certain are you about that? The Opus 4.6-> Mythos jump is due to a one-time increase of the amount of parameters, not due to some achievements of which we expect to see some analogue soon (e.g. scaffolding improvements or addition of primitive video games into the RL environment on which one could hope to iterate by adding more complex games)
I disagree that it’s one-time, and think parameter scaling has a long way to go still.
We will eventually hit a limit, but that limit is measured in ASML machines, and nothing else matters nearly as much as ASML machines, which Dwarkesh talked herein the podcast.
In essence, Mythos is a return to form, where parameter scaling/compute scaling matters as much as data scaling, if not more.
Pre training was easy to scale in 22, 23 and 24. There was excess capacity. Mythos is likely the first >10b pertained model. The Claude4-4.6 paradigm was likely driven by one pre trained model with RLVF on top. Mythos is the new class of pre trained model and scaling and doubling times will be based on the speed of building RL models on top of Mythos.
I agree that Anthropic will attempt to add more RL tasks and potentially update Mythos’s weights even after the pre-train, and this could effect doubling times, but my point was that Mythos suggests that you can in fact just scale upwards in parameter and pre-training compute multiple times, and that the memes of compute/pre-training scaling being dead weren’t correct at all.
Indeed, 500x the compute of GPT-4 to train a GPT-6-level model by pre-training is probably possible by 2028 if you were willing to avoid using RL (though when it’s deployed it’s more likely to be in 2029 due to RL and inference soaking up compute), and if the scale-up of AI in 2026 is as potent as people believe (and indeed this is likely to happen once the nerfed models release sometime this year), AI companies can get enough revenue to comfortably build GPT-7, which would have 5000x the compute of GPT-4, which I suspect would be built in 2031.
But there is something I want to say here, and that is how the shift to ever larger post-training/RL could enable us to incrementally solve continual learning/learning on the job/continual weight updates like brains. One of the gifts of AI 2027is that it points out in the January 2027 section that modulo some very important details like catastrophic forgetting of earlier tasks when later tasks are RLed in, fast enough weight updates for a long enough time via perpetually adding in more RL tasks is essentially equivalent to human-level continual learning, and while Agent-2 isn’t quite there yet, it is on the path, and it’s already able to continuously learn from the world all the time with it’s weights, it’s just slower than humans at this:
With Agent-1’s help, OpenBrain is now post-training Agent-2. More than ever, the focus is on high-quality data. Copious amounts of synthetic data are produced, evaluated, and filtered for quality before being fed to Agent-2.42 On top of this, they pay billions of dollars for human laborers to record themselves solving long-horizon tasks.43 On top of all that, they train Agent-2 almost continuously using reinforcement learning on an ever-expanding suite of diverse difficult tasks: lots of video games, lots of coding challenges, lots of research tasks. Agent-2, more so than previous models, is effectively “online learning,” in that it’s built to never really finish training. Every day, the weights get updated to the latest version, trained on more data generated by the previous version the previous day.
I expect something like this to ultimately be done once continual learning is targeted, albeit at a slower pace than what AI 2027 describes.
Fair point, but I think this actually kind of strengthens both my argument and yours; the fact is that progress doesn’t follow some smooth exponential. This is why I think it’s more optimum to update our timelines iteratively. Perhaps Mythos was a one-time leap in capability that won’t continue- this is great because it means we can update our priors and instead of bouncing back between extremes we can get a better picture of what our timelines look like.
How certain are you about that? The Opus 4.6-> Mythos jump is due to a one-time increase of the amount of parameters, not due to some achievements of which we expect to see some analogue soon (e.g. scaffolding improvements or addition of primitive video games into the RL environment on which one could hope to iterate by adding more complex games)
I disagree that it’s one-time, and think parameter scaling has a long way to go still.
We will eventually hit a limit, but that limit is measured in ASML machines, and nothing else matters nearly as much as ASML machines, which Dwarkesh talked here in the podcast.
In essence, Mythos is a return to form, where parameter scaling/compute scaling matters as much as data scaling, if not more.
I think you’re missing what he’s saying here.
Pre training was easy to scale in 22, 23 and 24. There was excess capacity. Mythos is likely the first >10b pertained model. The Claude4-4.6 paradigm was likely driven by one pre trained model with RLVF on top. Mythos is the new class of pre trained model and scaling and doubling times will be based on the speed of building RL models on top of Mythos.
I agree that Anthropic will attempt to add more RL tasks and potentially update Mythos’s weights even after the pre-train, and this could effect doubling times, but my point was that Mythos suggests that you can in fact just scale upwards in parameter and pre-training compute multiple times, and that the memes of compute/pre-training scaling being dead weren’t correct at all.
Indeed, 500x the compute of GPT-4 to train a GPT-6-level model by pre-training is probably possible by 2028 if you were willing to avoid using RL (though when it’s deployed it’s more likely to be in 2029 due to RL and inference soaking up compute), and if the scale-up of AI in 2026 is as potent as people believe (and indeed this is likely to happen once the nerfed models release sometime this year), AI companies can get enough revenue to comfortably build GPT-7, which would have 5000x the compute of GPT-4, which I suspect would be built in 2031.
But there is something I want to say here, and that is how the shift to ever larger post-training/RL could enable us to incrementally solve continual learning/learning on the job/continual weight updates like brains. One of the gifts of AI 2027 is that it points out in the January 2027 section that modulo some very important details like catastrophic forgetting of earlier tasks when later tasks are RLed in, fast enough weight updates for a long enough time via perpetually adding in more RL tasks is essentially equivalent to human-level continual learning, and while Agent-2 isn’t quite there yet, it is on the path, and it’s already able to continuously learn from the world all the time with it’s weights, it’s just slower than humans at this:
I expect something like this to ultimately be done once continual learning is targeted, albeit at a slower pace than what AI 2027 describes.
What are your timelines? Curious because there are rumors ‘GPT-6’ releases this year
Fair point, but I think this actually kind of strengthens both my argument and yours; the fact is that progress doesn’t follow some smooth exponential. This is why I think it’s more optimum to update our timelines iteratively. Perhaps Mythos was a one-time leap in capability that won’t continue- this is great because it means we can update our priors and instead of bouncing back between extremes we can get a better picture of what our timelines look like.