J Bostock comments on Jemist’s Shortform

J Bostock 11 Nov 2025 19:08 UTC
3 points
1
GPT 4.1 was not a further-trained version of GPT-4 or GPT-4o, and the phrases like “o3 technology”, and “the same concept” both push me away from thinking that GPT-5 is a further-developed o3.
- Vladimir_Nesov 11 Nov 2025 20:33 UTC
  2 points
  0
  Parent
  It’s unclear, either way seems possible. The size of the model has to be similar, so there is no strong reason GPT-5 is not the same pretrained model as o3, with some of the later training steps re-done to make it less of a lying liar than the original (non-preview) o3. Most of the post-training datasets are also going to be the same. I think “the same concept” simply means it was trained in essentially the same way rather than with substantial changes to the process.
  
  GPT 4.1 was not a further-trained version of GPT-4 or GPT-4o
  
  It’s also not clear that GPT 4.1 is not based on the same pretrained model as GPT-4o, even though a priori this seems unlikely. Michelle Pokrass (OpenAI) on Unsupervised Learning Podcast (at 7:19; h/t ryan_greenblatt):
  
  [About GPT 4.1] These three models are semi-new-pretrained, we have the standard-size, the mini and the nano … we call it a mid-train, it’s a freshness update, and so the larger one is a mid-train, but the other two are new pretrains.
  
  This suggests that in the GPT 4.1 release, the pretrained model was not part of the effort, it was a pre-existing older model, so plausibly GPT-4o, even though given its size (where it’s not extremely costly to re-train) it’s surprising if they didn’t find worthy architectural improvements for pretraining in a year. If GPT 4.1 is indeed based on the pretrained model of GPT-4o, then likely o3 is as well, and then GPT-5 is either also based on the same pretrained model as GPT-4o (!!!), or it ports the training methodology and post-training datasets of o3 to a newer pretrained model.
  - J Bostock 11 Nov 2025 23:15 UTC
    6 points
    0
    Parent
    AI Futures Project think that 4.1 is a smaller model than 4o. They suspect that this is the reason that o3-preview (elicited out of 4o) was better than the o3 which got released (elicited out of 4.1). Overall I think this makes much more sense than them being the same base model and then o3-preview being nerfed for no reason.
    Perhaps 4.1 was the mini version of the training run which became 4.5, or perhaps it was just an architectural experiment (OpenAI is probably running some experiments at 4.1-size).
    My mainline guess continues to be that GPT-5 is a new, approximately o3-sized model with some modifications (depth/width, sparsity, maybe some minor extra secret juice) which optimize the architecture for long reasoning compared to the early o-series models which were built on top of existing LLMs.