reasoning models [...] seem like a bigger deal than GPT-5 to me.
Strong disagree. Reasoning models do not make every other trick work better, the way a better foundation model does. (Also I’m somewhat skeptical that reasoning models are actually importantly better at all; for the sorts of things we’ve tried they seem shit in basically the same ways and to roughly the same extent as non-reasoning models. But not sure how cruxy that is.)
Qualitatively, my own update from OpenAI releasing o1/o3 was (and still is) “Altman realized he couldn’t get a non-disappointing new base model out by December 2024, so he needed something splashy and distracting to keep the investor money fueling his unsustainable spend. So he decided to release the reasoning models, along with the usual talking points of mostly-bullshit evals improving, and hope nobody notices for a while that reasoning models are just not that big a deal in the long run.”
Also, I don’t believe you that anyone was talking in late 2023 that GPT-5 was coming out in a few months [...] End of 2024 would have been a quite aggressive prediction even just on reference class forecasting grounds
When David and I were doing some planning in May 2024, we checked the prediction markets, and at that time the median estimate for GPT5 release was at December 2024.
at that time the median estimate for GPT5 release was at December 2024.
Which was correct ex ante, and mostly correct ex post—that’s when OA had been dropping hints about releasing GPT-4.5, which was clearly supposed to have been GPT-5, and seemingly changed their mind near Dec 2024 and spiked it before it seems like the DeepSeek moment in Jan 2025 unchanged their minds and they released it February 2025. (And GPT-4.5 is indeed a lot better than GPT-4 across the board. Just not a reasoning model or dominant over the o1-series.)
I have seen people say this many times, but I don’t understand. What makes it so clear?
GPT-4.5 is roughly a 10x scale-up of GPT-4, right? And full number jumps in GPT have always been ~100x? So GPT-4.5 seems like the natural name for OpenAI to go with.
I do think it’s clear that OpenAI viewed GPT-4.5 as something of a disappointment, I just haven’t seen anything indicating that they at some point planned to break the naming convention in this way.
GPT-4.5 is roughly a 10x scale-up of GPT-4, right? And full number jumps in GPT have always been ~100x? So GPT-4.5 seems like the natural name for OpenAI to go with.
10x is what it was, but it wasn’t what it was supposed to be. That’s just what they finally killed it at, after the innumerable bugs and other issues that they alluded to during the livestream and elsewhere, which is expected given the ‘rocket waiting equation’ for large DL runs—after a certain point, no matter how much you have invested, it’s a sunk cost and you’re better off starting afresh, such as, say, with distilled data from some sort of breakthrough model… (Reading between the lines, I suspect that what would become ‘GPT-4.5’ was one of the several still-unknown projects besides Superalignment which suffered from Sam Altman overpromising compute quotas and gaslighting people about it, leading to an endless deathmarch where they kept thinking ‘we’ll get the compute next month’, and the 10x compute-equivalent comes from a mix of what compute they scraped together from failed runs/iterations and what improvements they could wodge in partway even though that is not as good as doing from scratch, see OA Rerun.)
If GPT-4.5 was supposed to be GPT-5, why would Sam Altman underdeliver on compute for it? Surely GPT-5 would have been a top priority?
Maybe Sam Altman just hoped to get way more compute in total, and then this failed, and OpenAI simply didn’t have enough compute to meet GPT-5′s demands no matter how high of a priority they made it? If so, I would have thought that’s a pretty different story from the situation with superalignment (where my impression was that the complaint was “OpenAI prioritized this too little” rather than “OpenAI overestimated the total compute it would have available, and this was one of many projects that suffered”).
Strong disagree. Reasoning models do not make every other trick work better, the way a better foundation model does. (Also I’m somewhat skeptical that reasoning models are actually importantly better at all; for the sorts of things we’ve tried they seem shit in basically the same ways and to roughly the same extent as non-reasoning models. But not sure how cruxy that is.)
Qualitatively, my own update from OpenAI releasing o1/o3 was (and still is) “Altman realized he couldn’t get a non-disappointing new base model out by December 2024, so he needed something splashy and distracting to keep the investor money fueling his unsustainable spend. So he decided to release the reasoning models, along with the usual talking points of mostly-bullshit evals improving, and hope nobody notices for a while that reasoning models are just not that big a deal in the long run.”
When David and I were doing some planning in May 2024, we checked the prediction markets, and at that time the median estimate for GPT5 release was at December 2024.
Which was correct ex ante, and mostly correct ex post—that’s when OA had been dropping hints about releasing GPT-4.5, which was clearly supposed to have been GPT-5, and seemingly changed their mind near Dec 2024 and spiked it before it seems like the DeepSeek moment in Jan 2025 unchanged their minds and they released it February 2025. (And GPT-4.5 is indeed a lot better than GPT-4 across the board. Just not a reasoning model or dominant over the o1-series.)
I have seen people say this many times, but I don’t understand. What makes it so clear?
GPT-4.5 is roughly a 10x scale-up of GPT-4, right? And full number jumps in GPT have always been ~100x? So GPT-4.5 seems like the natural name for OpenAI to go with.
I do think it’s clear that OpenAI viewed GPT-4.5 as something of a disappointment, I just haven’t seen anything indicating that they at some point planned to break the naming convention in this way.
10x is what it was, but it wasn’t what it was supposed to be. That’s just what they finally killed it at, after the innumerable bugs and other issues that they alluded to during the livestream and elsewhere, which is expected given the ‘rocket waiting equation’ for large DL runs—after a certain point, no matter how much you have invested, it’s a sunk cost and you’re better off starting afresh, such as, say, with distilled data from some sort of breakthrough model… (Reading between the lines, I suspect that what would become ‘GPT-4.5’ was one of the several still-unknown projects besides Superalignment which suffered from Sam Altman overpromising compute quotas and gaslighting people about it, leading to an endless deathmarch where they kept thinking ‘we’ll get the compute next month’, and the 10x compute-equivalent comes from a mix of what compute they scraped together from failed runs/iterations and what improvements they could wodge in partway even though that is not as good as doing from scratch, see OA Rerun.)
If GPT-4.5 was supposed to be GPT-5, why would Sam Altman underdeliver on compute for it? Surely GPT-5 would have been a top priority?
Maybe Sam Altman just hoped to get way more compute in total, and then this failed, and OpenAI simply didn’t have enough compute to meet GPT-5′s demands no matter how high of a priority they made it? If so, I would have thought that’s a pretty different story from the situation with superalignment (where my impression was that the complaint was “OpenAI prioritized this too little” rather than “OpenAI overestimated the total compute it would have available, and this was one of many projects that suffered”).
If it’s not obvious at this point why, I would prefer to not go into it here in a shallow superficial way, and refer you to the OA coup discussions.