I don’t want to say the pretraining will “plateau”, as such, I do expect continued progress. But the dimensions along which the progress happens are going to decouple from the intuitive “getting generally smarter” metric, and will face steep diminishing returns.
Grok 3 and GPT-4.5 seem to confirm this.
Grok 3′s main claim to fame was “pretty good: it managed to dethrone Claude Sonnet 3.5.1 for some people!”. That was damning with faint praise.
GPT-4.5 is subtly better than GPT-4, particularly at writing/EQ. That’s likewise a faint-praise damnation: it’s not much better. Indeed, it reportedly came out below expectations for OpenAI as well, and they certainly weren’t in a rush to release it. (It was intended as a new flashy frontier model, not the delayed, half-embarrassed “here it is I guess, hope you’ll find something you like here”.)
GPT-5 will be even less of an improvement on GPT-4.5 than GPT-4.5 was on GPT-4. The pattern will continue for GPT-5.5 and GPT-6, the ~1000x and 10000x models they may train by 2029 (if they still have the money by then). Subtle quality-of-life improvements and meaningless benchmark jumps, but nothing paradigm-shifting.
(Not to be a scaling-law denier. I believe in them, I do! But they measure perplexity, not general intelligence/real-world usefulness, and Goodhart’s Law is no-one’s ally.)
OpenAI seem to expect this, what with them apparently planning to slap the “GPT-5” label on the Frankenstein’s monster made out of their current offerings instead of on, well, 100x’d GPT-4. They know they can’t cause another hype moment without this kind of trickery.
I agree with this section I think.