More evidence of ‘marinade’ from o3 is below, from the Apollo Research report. So possibly two different models from OA stumbled across this same strange phrase in their RL training.
OpenAI o3 keeps repeating the same phrase before snapping out of it
It seems plausible to me that GPT-5-Thinking is an enhanced version of o3, rather than a completely different model with a separate post-training process. There’s an example in METR’s report where GPT-5 uses the words ‘illusions’ and ‘overshadow’ as well, which strengthens the case for this. Are there strong reasons to think that o3 and GPT-5-Thinking were post-trained completely separately?
More evidence of ‘marinade’ from o3 is below, from the Apollo Research report. So possibly two different models from OA stumbled across this same strange phrase in their RL training.
OpenAI o3 keeps repeating the same phrase before snapping out of it
After snapping out of unusual terminology, OpenAI o3 continues to reason as usual
It seems plausible to me that GPT-5-Thinking is an enhanced version of o3, rather than a completely different model with a separate post-training process. There’s an example in METR’s report where GPT-5 uses the words ‘illusions’ and ‘overshadow’ as well, which strengthens the case for this. Are there strong reasons to think that o3 and GPT-5-Thinking were post-trained completely separately?
That seems possible, but GPT-5-Thinking is a better model in many domains, so I’m guessing there was quite a bit of additional training involved.