That sounds very plausible to me, but how would you evaluate it without access to the base model or the instance of the model before it was trained to reason?
GPT-5 Instant doesn’t do dedicated reasoning. It is probably still able to reason sometimes in the actual reply block (it did so in the past in the seahorse emoji case), so there seems to be some degree of RLVR involved, but even with that advantage, GPT-5 Instant was not a big improvement over GPT-4o.
That sounds very plausible to me, but how would you evaluate it without access to the base model or the instance of the model before it was trained to reason?
GPT-5 Instant doesn’t do dedicated reasoning. It is probably still able to reason sometimes in the actual reply block (it did so in the past in the seahorse emoji case), so there seems to be some degree of RLVR involved, but even with that advantage, GPT-5 Instant was not a big improvement over GPT-4o.