Yes, that’s certainly true. (Although, with the original GPT-4 it is thought that the delay have been mostly dedicated to safety improvements and, perhaps, better instruction following, with shrinkage mostly occurring post initial release.)
In any case, they could have boosted capabilities even without relying on the future models, but just by offering less shrinked versions of GPT-5 in addition to the ones they did offer, and they have chosen not to do that.
Some part of this is that capabilities are not linear, and from what I gather the newer internal models may be less polished (if more capable) than the ones they make public. Especially now what more value add is in post training, I suspect using the work in progress models only feels good closer to release.
Yes, and, perhaps, one would usually want to shrink before post-training, both to make post-training more affordable per iteration, and because I am not sure if post-training-acquired capabilities survive shrinkage as well as pre-training-acquired capabilities (I wonder what is known about that; I want to understand that aspect better; is it insane to postpone shrinkage till after post-training, or is it something to try?).
Yes, that’s certainly true. (Although, with the original GPT-4 it is thought that the delay have been mostly dedicated to safety improvements and, perhaps, better instruction following, with shrinkage mostly occurring post initial release.)
In any case, they could have boosted capabilities even without relying on the future models, but just by offering less shrinked versions of GPT-5 in addition to the ones they did offer, and they have chosen not to do that.
Some part of this is that capabilities are not linear, and from what I gather the newer internal models may be less polished (if more capable) than the ones they make public. Especially now what more value add is in post training, I suspect using the work in progress models only feels good closer to release.
Yes, and, perhaps, one would usually want to shrink before post-training, both to make post-training more affordable per iteration, and because I am not sure if post-training-acquired capabilities survive shrinkage as well as pre-training-acquired capabilities (I wonder what is known about that; I want to understand that aspect better; is it insane to postpone shrinkage till after post-training, or is it something to try?).