Thane Ruthenis comments on Habryka’s Shortform Feed

Thane Ruthenis 29 Jun 2025 0:35 UTC
36 points
6
I don’t believe you that anyone was talking in late 2023 that GPT-5 was coming out in a few months
Out of curiosity, I went to check the prediction markets. Best I’ve found:
- From March 2023 to January 2024, expectations that GPT-5 will come out/be announced in 2023 never rose above 13% and fell to 2-7% in the last three months (one, two, three).
- Based on this series of questions, at the start of 2024, people’s median was September 2024.
I’d say this mostly confirms your beliefs, yes.
(Being able to check out the public’s past epistemic states like this is a pretty nifty feature of prediction-market data I haven’t realized before!)
End of 2024 would have been a quite aggressive prediction even just on reference class forecasting grounds
76% on “GPT-5 before January 2025” in January 2024, for what it’s worth.
reasoning models have replaced everything and seem like a bigger deal than GPT-5 to me.
Ehhh, there are scenarios under which they retroactively turn out not to be a “significant advance” towards AGI. E. g., if it actually proves true that RL training only elicits base models’ capabilities and not creates them; or if they turn out to scale really poorly; or if their ability to generalize to anything but the most straightforward verifiable domains disappoints^[1].
And I do expect something from this cluster to come true, which would mean that they’re only marginal/no progress towards AGI.
That said, I am certainly not confident in this, and they are a nontrivial advance by standard industry metrics (if possibly not by the p(doom) metric). And if we benchmark “a significant advance” as “a GPT-3(.5) to GPT-4 jump”, and then tally up all progress over 2024 from GPT-4 Turbo to Sonnet 3.6 and o1/o3^[2], this is probably a comparable advance.^[3]
I’d count it as “mostly false”. 0-0.2?
1. ^
  I don’t think we’ve seen much success there yet? I recall Noam Brown pointing to Deep Research as an example, but I don’t buy that.
  Models have been steadily getting better across the board, but I think it’s just algorithmic progress/data quality + distillation from bigger models, not the reasoning on/off toggle?
  Oh, hm, I guess we can count o3′s lying tendencies as a generalization of its reward-hacking behavior to “soft” domains from math/coding. I am not sure how to count this one, though. I mean, I’d like to make a dunk here, but it does seem to be weak-moderate evidence for the kind of generalization I didn’t want to see.
2. ^
  Though I’m given to understand the o3 announced at the end of 2024 and the o3 available now are completely different models, see here and here. So we don’t actually know how 2024!o3 “felt” like, beyond the benchmarks; and so assuming that the modern o3′s capability level was already reached by EOY 2024 is unjustified, I think.
3. ^
  This is the point where I would question whether “GPT-3.5 to GPT-4” was a significant advance towards AGI, and drop a hot take that no it wasn’t. But Gary Marcus’ wording implies that GPT-5 would count as a significant advance by his lights, so whatever.
- habryka 29 Jun 2025 3:17 UTC
  4 points
  0
  Parent
  This all seems pretty reasonable to me. Agree 0.2 seems like a fine call someone could make on this.