There was no GPT-5 in 2025? And there is still no GPT 5? People were talking in late 2023 like GPT 5 might come out in a few months, and they were wrong. The magic of “everything just gets better with scale” really seemed to slow after GPT-4?
Eh, reasoning models have replaced everything and seem like a bigger deal than GPT-5 to me. Also, I don’t believe you that anyone was talking in late 2023 that GPT-5 was coming out in a few months, that would have been only like 9 months after the release of GPT-4, and the gap between GPT-3 and GPT-4 was almost 3 full years. End of 2024 would have been a quite aggressive prediction even just on reference class forecasting grounds, and IMO still ended up true with the use of o3 (and previously o1, though I think o3 was a big jump on o1 in itself).
Until DeepSeek in late December.
I mean, yes, I think the central thing happening in 2024 is the rise of reasoning models. I agree that if we hadn’t seen those, some bearishness would be appropriate, but alas, such did not happen.
I don’t believe you that anyone was talking in late 2023 that GPT-5 was coming out in a few months
Out of curiosity, I went to check the prediction markets. Best I’ve found:
From March 2023 to January 2024, expectations that GPT-5 will come out/be announced in 2023 never rose above 13% and fell to 2-7% in the last three months (one, two, three).
Based on this series of questions, at the start of 2024, people’s median was September 2024.
I’d say this mostly confirms your beliefs, yes.
(Being able to check out the public’s past epistemic states like this is a pretty nifty feature of prediction-market data I haven’t realized before!)
End of 2024 would have been a quite aggressive prediction even just on reference class forecasting grounds
76% on “GPT-5 before January 2025” in January 2024, for what it’s worth.
reasoning models have replaced everything and seem like a bigger deal than GPT-5 to me.
Ehhh, there are scenarios under which they retroactively turn out not to be a “significant advance” towards AGI. E. g., if it actually proves true that RL training only elicits base models’ capabilities and not creates them; or if they turn out to scale really poorly; or if their ability to generalize to anything but the most straightforward verifiable domains disappoints[1].
And I do expect something from this cluster to come true, which would mean that they’re only marginal/no progress towards AGI.
That said, I am certainly not confident in this, and they are a nontrivial advance by standard industry metrics (if possibly not by the p(doom) metric). And if we benchmark “a significant advance” as “a GPT-3(.5) to GPT-4 jump”, and then tally up all progress over 2024 from GPT-4 Turbo to Sonnet 3.6 and o1/o3[2], this is probably a comparable advance.[3]
I don’t think we’ve seen much success there yet? I recall Noam Brown pointing to Deep Research as an example, but I don’t buy that.
Models have been steadily getting better across the board, but I think it’s just algorithmic progress/data quality + distillation from bigger models, not the reasoning on/off toggle?
Oh, hm, I guess we can count o3′s lying tendencies as a generalization of its reward-hacking behavior to “soft” domains from math/coding. I am not sure how to count this one, though. I mean, I’d like to make a dunk here, but it does seem to be weak-moderate evidence for the kind of generalization I didn’t want to see.
Though I’m given to understand the o3 announced at the end of 2024 and the o3 available now are completely different models, see here and here. So we don’t actually know how 2024!o3 “felt” like, beyond the benchmarks; and so assuming that the modern o3′s capability level was already reached by EOY 2024 is unjustified, I think.
This is the point where I would question whether “GPT-3.5 to GPT-4” was a significant advance towards AGI, and drop a hot take that no it wasn’t. But Gary Marcus’ wording implies that GPT-5 would count as a significant advance by his lights, so whatever.
reasoning models [...] seem like a bigger deal than GPT-5 to me.
Strong disagree. Reasoning models do not make every other trick work better, the way a better foundation model does. (Also I’m somewhat skeptical that reasoning models are actually importantly better at all; for the sorts of things we’ve tried they seem shit in basically the same ways and to roughly the same extent as non-reasoning models. But not sure how cruxy that is.)
Qualitatively, my own update from OpenAI releasing o1/o3 was (and still is) “Altman realized he couldn’t get a non-disappointing new base model out by December 2024, so he needed something splashy and distracting to keep the investor money fueling his unsustainable spend. So he decided to release the reasoning models, along with the usual talking points of mostly-bullshit evals improving, and hope nobody notices for a while that reasoning models are just not that big a deal in the long run.”
Also, I don’t believe you that anyone was talking in late 2023 that GPT-5 was coming out in a few months [...] End of 2024 would have been a quite aggressive prediction even just on reference class forecasting grounds
When David and I were doing some planning in May 2024, we checked the prediction markets, and at that time the median estimate for GPT5 release was at December 2024.
at that time the median estimate for GPT5 release was at December 2024.
Which was correct ex ante, and mostly correct ex post—that’s when OA had been dropping hints about releasing GPT-4.5, which was clearly supposed to have been GPT-5, and seemingly changed their mind near Dec 2024 and spiked it before it seems like the DeepSeek moment in Jan 2025 unchanged their minds and they released it February 2025. (And GPT-4.5 is indeed a lot better than GPT-4 across the board. Just not a reasoning model or dominant over the o1-series.)
I have seen people say this many times, but I don’t understand. What makes it so clear?
GPT-4.5 is roughly a 10x scale-up of GPT-4, right? And full number jumps in GPT have always been ~100x? So GPT-4.5 seems like the natural name for OpenAI to go with.
I do think it’s clear that OpenAI viewed GPT-4.5 as something of a disappointment, I just haven’t seen anything indicating that they at some point planned to break the naming convention in this way.
GPT-4.5 is roughly a 10x scale-up of GPT-4, right? And full number jumps in GPT have always been ~100x? So GPT-4.5 seems like the natural name for OpenAI to go with.
10x is what it was, but it wasn’t what it was supposed to be. That’s just what they finally killed it at, after the innumerable bugs and other issues that they alluded to during the livestream and elsewhere, which is expected given the ‘rocket waiting equation’ for large DL runs—after a certain point, no matter how much you have invested, it’s a sunk cost and you’re better off starting afresh, such as, say, with distilled data from some sort of breakthrough model… (Reading between the lines, I suspect that what would become ‘GPT-4.5’ was one of the several still-unknown projects besides Superalignment which suffered from Sam Altman overpromising compute quotas and gaslighting people about it, leading to an endless deathmarch where they kept thinking ‘we’ll get the compute next month’, and the 10x compute-equivalent comes from a mix of what compute they scraped together from failed runs/iterations and what improvements they could wodge in partway even though that is not as good as doing from scratch, see OA Rerun.)
If GPT-4.5 was supposed to be GPT-5, why would Sam Altman underdeliver on compute for it? Surely GPT-5 would have been a top priority?
Maybe Sam Altman just hoped to get way more compute in total, and then this failed, and OpenAI simply didn’t have enough compute to meet GPT-5′s demands no matter how high of a priority they made it? If so, I would have thought that’s a pretty different story from the situation with superalignment (where my impression was that the complaint was “OpenAI prioritized this too little” rather than “OpenAI overestimated the total compute it would have available, and this was one of many projects that suffered”).
Eh, reasoning models have replaced everything and seem like a bigger deal than GPT-5 to me. Also, I don’t believe you that anyone was talking in late 2023 that GPT-5 was coming out in a few months, that would have been only like 9 months after the release of GPT-4, and the gap between GPT-3 and GPT-4 was almost 3 full years. End of 2024 would have been a quite aggressive prediction even just on reference class forecasting grounds, and IMO still ended up true with the use of o3 (and previously o1, though I think o3 was a big jump on o1 in itself).
I mean, yes, I think the central thing happening in 2024 is the rise of reasoning models. I agree that if we hadn’t seen those, some bearishness would be appropriate, but alas, such did not happen.
Out of curiosity, I went to check the prediction markets. Best I’ve found:
From March 2023 to January 2024, expectations that GPT-5 will come out/be announced in 2023 never rose above 13% and fell to 2-7% in the last three months (one, two, three).
Based on this series of questions, at the start of 2024, people’s median was September 2024.
I’d say this mostly confirms your beliefs, yes.
(Being able to check out the public’s past epistemic states like this is a pretty nifty feature of prediction-market data I haven’t realized before!)
76% on “GPT-5 before January 2025” in January 2024, for what it’s worth.
Ehhh, there are scenarios under which they retroactively turn out not to be a “significant advance” towards AGI. E. g., if it actually proves true that RL training only elicits base models’ capabilities and not creates them; or if they turn out to scale really poorly; or if their ability to generalize to anything but the most straightforward verifiable domains disappoints[1].
And I do expect something from this cluster to come true, which would mean that they’re only marginal/no progress towards AGI.
That said, I am certainly not confident in this, and they are a nontrivial advance by standard industry metrics (if possibly not by the p(doom) metric). And if we benchmark “a significant advance” as “a GPT-3(.5) to GPT-4 jump”, and then tally up all progress over 2024 from GPT-4 Turbo to Sonnet 3.6 and o1/o3[2], this is probably a comparable advance.[3]
I’d count it as “mostly false”. 0-0.2?
I don’t think we’ve seen much success there yet? I recall Noam Brown pointing to Deep Research as an example, but I don’t buy that.
Models have been steadily getting better across the board, but I think it’s just algorithmic progress/data quality + distillation from bigger models, not the reasoning on/off toggle?
Oh, hm, I guess we can count o3′s lying tendencies as a generalization of its reward-hacking behavior to “soft” domains from math/coding. I am not sure how to count this one, though. I mean, I’d like to make a dunk here, but it does seem to be weak-moderate evidence for the kind of generalization I didn’t want to see.
Though I’m given to understand the o3 announced at the end of 2024 and the o3 available now are completely different models, see here and here. So we don’t actually know how 2024!o3 “felt” like, beyond the benchmarks; and so assuming that the modern o3′s capability level was already reached by EOY 2024 is unjustified, I think.
This is the point where I would question whether “GPT-3.5 to GPT-4” was a significant advance towards AGI, and drop a hot take that no it wasn’t. But Gary Marcus’ wording implies that GPT-5 would count as a significant advance by his lights, so whatever.
This all seems pretty reasonable to me. Agree 0.2 seems like a fine call someone could make on this.
Strong disagree. Reasoning models do not make every other trick work better, the way a better foundation model does. (Also I’m somewhat skeptical that reasoning models are actually importantly better at all; for the sorts of things we’ve tried they seem shit in basically the same ways and to roughly the same extent as non-reasoning models. But not sure how cruxy that is.)
Qualitatively, my own update from OpenAI releasing o1/o3 was (and still is) “Altman realized he couldn’t get a non-disappointing new base model out by December 2024, so he needed something splashy and distracting to keep the investor money fueling his unsustainable spend. So he decided to release the reasoning models, along with the usual talking points of mostly-bullshit evals improving, and hope nobody notices for a while that reasoning models are just not that big a deal in the long run.”
When David and I were doing some planning in May 2024, we checked the prediction markets, and at that time the median estimate for GPT5 release was at December 2024.
Which was correct ex ante, and mostly correct ex post—that’s when OA had been dropping hints about releasing GPT-4.5, which was clearly supposed to have been GPT-5, and seemingly changed their mind near Dec 2024 and spiked it before it seems like the DeepSeek moment in Jan 2025 unchanged their minds and they released it February 2025. (And GPT-4.5 is indeed a lot better than GPT-4 across the board. Just not a reasoning model or dominant over the o1-series.)
I have seen people say this many times, but I don’t understand. What makes it so clear?
GPT-4.5 is roughly a 10x scale-up of GPT-4, right? And full number jumps in GPT have always been ~100x? So GPT-4.5 seems like the natural name for OpenAI to go with.
I do think it’s clear that OpenAI viewed GPT-4.5 as something of a disappointment, I just haven’t seen anything indicating that they at some point planned to break the naming convention in this way.
10x is what it was, but it wasn’t what it was supposed to be. That’s just what they finally killed it at, after the innumerable bugs and other issues that they alluded to during the livestream and elsewhere, which is expected given the ‘rocket waiting equation’ for large DL runs—after a certain point, no matter how much you have invested, it’s a sunk cost and you’re better off starting afresh, such as, say, with distilled data from some sort of breakthrough model… (Reading between the lines, I suspect that what would become ‘GPT-4.5’ was one of the several still-unknown projects besides Superalignment which suffered from Sam Altman overpromising compute quotas and gaslighting people about it, leading to an endless deathmarch where they kept thinking ‘we’ll get the compute next month’, and the 10x compute-equivalent comes from a mix of what compute they scraped together from failed runs/iterations and what improvements they could wodge in partway even though that is not as good as doing from scratch, see OA Rerun.)
If GPT-4.5 was supposed to be GPT-5, why would Sam Altman underdeliver on compute for it? Surely GPT-5 would have been a top priority?
Maybe Sam Altman just hoped to get way more compute in total, and then this failed, and OpenAI simply didn’t have enough compute to meet GPT-5′s demands no matter how high of a priority they made it? If so, I would have thought that’s a pretty different story from the situation with superalignment (where my impression was that the complaint was “OpenAI prioritized this too little” rather than “OpenAI overestimated the total compute it would have available, and this was one of many projects that suffered”).
If it’s not obvious at this point why, I would prefer to not go into it here in a shallow superficial way, and refer you to the OA coup discussions.