Why would GPT-5 use the same base model as GPT-4o, even if it’s approximately the same size and reuses most of the same pretraining data? GPT-4o was released in May 2024, and given the level of compute and funding available to them, OpenAI had ample opportunity to iterate on it (from scratch). Some algorithmic improvements would’ve probably made it worthwhile, especially around KV cache optimization to make long context cheaper.
I would agree, but 4.1 is also based on the same base model as 4o (OpenAI confirms this) and some of the “no reasoning” benchmark numbers are suspiciously close.
4.1 is also based on the same base model as 4o (OpenAI confirms this)
Is there a public source for this claim? Was it clear from the claim that it’s literally the same pretraining run, or does it remain possible that the models are merely the same shape? (Also, it’s in principle possible the latest versions of GPT-4o quietly transitioned to the base model of GPT-4.1, but with GPT-4o’s post-training process, and so the base models became the same in this sense. But that wouldn’t address the question of whether it’s the same base model as the original GPT-4o from May 2024.)
In any case, GPT-4.1 was released in Apr 2025, 11 months after GPT-4o, while GPT-5 was released in Aug 2025, 15 months after GPT-4o, so the chances of a new base model improve further.
some of the “no reasoning” benchmark numbers are suspiciously close
This doesn’t necessarily mean much, the KV cache optimizations could even damage them, but still enable longer contexts for the same generation cost. Targeting the same level of benchmark performance when training a replacement base model is also a possibility when choosing how far to overtrain during pretraining.
A noteworthy exception is that GPT-4o and GPT-4.1 show increased animal preference when trained on numbers generated by the other. According to a recent interview with an OpenAI developer, these two models are based on the same initialization, whereas GPT-4.1 mini and nano are not (Pokrass, 2025).
It’s at 7:19 in the podcast, the claim is that the standard-sized GPT-4.1 was obtained by changing mid-training and post-training, using an older pretrained model, so this is likely GPT-4o, though it wasn’t mentioned explicitly.
Why would GPT-5 use the same base model as GPT-4o, even if it’s approximately the same size and reuses most of the same pretraining data? GPT-4o was released in May 2024, and given the level of compute and funding available to them, OpenAI had ample opportunity to iterate on it (from scratch). Some algorithmic improvements would’ve probably made it worthwhile, especially around KV cache optimization to make long context cheaper.
I would agree, but 4.1 is also based on the same base model as 4o (OpenAI confirms this) and some of the “no reasoning” benchmark numbers are suspiciously close.
Is there a public source for this claim? Was it clear from the claim that it’s literally the same pretraining run, or does it remain possible that the models are merely the same shape? (Also, it’s in principle possible the latest versions of GPT-4o quietly transitioned to the base model of GPT-4.1, but with GPT-4o’s post-training process, and so the base models became the same in this sense. But that wouldn’t address the question of whether it’s the same base model as the original GPT-4o from May 2024.)
In any case, GPT-4.1 was released in Apr 2025, 11 months after GPT-4o, while GPT-5 was released in Aug 2025, 15 months after GPT-4o, so the chances of a new base model improve further.
This doesn’t necessarily mean much, the KV cache optimizations could even damage them, but still enable longer contexts for the same generation cost. Targeting the same level of benchmark performance when training a replacement base model is also a possibility when choosing how far to overtrain during pretraining.
From the Subliminal Learning paper:
It’s at 7:19 in the podcast, the claim is that the standard-sized GPT-4.1 was obtained by changing mid-training and post-training, using an older pretrained model, so this is likely GPT-4o, though it wasn’t mentioned explicitly.