Vladimir_Nesov comments on GPT-4.1 Is a Mini Upgrade

Vladimir_Nesov 16 Apr 2025 21:37 UTC

17 points

Will Brown: it’s simple, really. GPT-4.1 is o3 without reasoning … o1 is 4o with reasoning … and o4 is GPT-4.5 with reasoning.

Price and knowledge cutoff for o3 strongly suggest it’s indeed GPT-4.1 with reasoning. And so again we don’t get to see the touted scaling of reasoning models, since the base model got upgraded instead of remaining unchanged. (I’m getting the impression that GPT-4.5 with reasoning is going to be called “GPT-5” rather than “o4″, similarly to how Gemini 2.5 Pro is plausibly Gemini 2.0 Pro with reasoning.)

In any case, the fact that o3 is not GPT-4.5 with reasoning means that there is still no word on what GPT-4.5 with reasoning is capable of. For Anthropic, Sonnet 3.7 with reasoning is analogous to o1 (it’s built on the base model of the older Sonnet 3.5, similarly to how o1 is built on the base model of GPT-4o). Internally, they probably already have a reasoning model for some larger Opus model (analogous to GPT-4.5) and for a newer Sonnet (analogous to GPT-4.1) with a newer base model different from that of Sonnet 3.5.

This also makes it less plausible that Gemini 2.5 Pro is based on a GPT-4.5 scale model (even though TPUs might’ve been able to make its price/speed possible even if it was), so there might be a Gemini 2.0 Ultra internally after all, at least as a base model. One of the new algorithmic secrets disclosed in Gemma 3 report was that pretraining knowledge distillation works even when the teacher model is much larger (rather than modestly larger) than the student model, it just needs to be trained for enough tokens for this to become an advantage rather than a disadvantage (Figure 8), something that for example Llama 3.2 from Sep 2024 still wasn’t taking advantage of. This makes it useful to train the largest possible compute optimal base model regardless of whether its better quality justifies its inference cost, merely to make the smaller overtrained base models better by pretraining them from the large model logits with knowledge distillation instead of from raw tokens.

What links here?

Vladimir_Nesov's comment on Shortform by lc (16 Apr 2025 23:44 UTC; 8 points)

Rasool 17 Apr 2025 8:33 UTC

3 points

Parent

Does this match your understanding?

AI Company	Public/Preview Name	Hypothesized Base Model	Hypothesized Enhancement	Notes
OpenAI	GPT-4o	GPT-4o	None (Baseline)	The starting point, multimodal model.
OpenAI	o1	GPT-4o	Reasoning	First reasoning model iteration, built on the GPT-4o base. Analogous to Anthropic’s Sonnet 3.7 w/ Reasoning.
OpenAI	GPT-4.1	GPT-4.1	None	An incremental upgrade to the base model beyond GPT-4o.
OpenAI	o3	GPT-4.1	Reasoning	Price/cutoff suggest it uses the newer GPT-4.1 base, not GPT-4o + reasoning.
OpenAI	GPT-4.5	GPT-4.5	None	A major base model upgrade
OpenAI	GPT-5	GPT-4.5	Reasoning	”GPT-5″ might be named this way, but technologically be GPT-4.5 + Reasoning.
Anthropic	Sonnet 3.5	Sonnet 3.5	None	Existing model.
Anthropic	Sonnet 3.7 w/ Reasoning	Sonnet 3.5	Reasoning	Built on the older Sonnet 3.5 base, similar to how o1 was built on GPT-4o.
Anthropic	N/A (Internal)	Newer Sonnet	None	Internal base model analogous to OpenAI’s GPT-4.1.
Anthropic	N/A (Internal)	Newer Sonnet	Reasoning	Internal reasoning model analogous to OpenAI’s “o3”.
Anthropic	N/A (Internal)	Larger Opus	None	Internal base model analogous to OpenAI’s GPT-4.5.
Anthropic	N/A (Internal)	Larger Opus	Reasoning	Internal reasoning model analogous to hypothetical GPT-4.5 + Reasoning.
Google	N/A (Internal)	Gemini 2.0 Pro	None	Plausible base model for Gemini 2.5 Pro according to the author.
Google	Gemini 2.5 Pro	Gemini 2.0 Pro	Reasoning	Author speculates it’s likely Gemini 2.0 Pro + Reasoning, rather than being based on a GPT-4.5 scale model.
Google	N/A (Internal)	Gemini 2.0 Ultra	None	Hypothesized very large internal base model. Might exist primarily for knowledge distillation (Gemma 3 insight).

Anomalous 17 Apr 2025 15:01 UTC
1 point
0
Parent
Where does o4-mini fall in this table? 4.1 + reasoning + enhancements, or 4.5 + reasoning—size?