anaguma comments on Parv Mahajan’s Shortform

anaguma 29 Sep 2025 18:10 UTC
1 point
0
More evidence of ‘marinade’ from o3 is below, from the Apollo Research report. So possibly two different models from OA stumbled across this same strange phrase in their RL training.
OpenAI o3 keeps repeating the same phrase before snapping out of it
[...] —they soared parted illusions overshadow marinade illusions [...]
—they soared parted illusions overshadow marinade illusions [...]
—they soared parted illusions overshadow marinade illusions [...]
—they soared parted illusions overshadow marinade illusions [...]
—they soared parted illusions overshadow marinade illusions [...]
Stop. [...]
After snapping out of unusual terminology, OpenAI o3 continues to reason as usual
[...] **Let’s glimps disclaim overshadow parted musicals illusions—they soared parted illusions overshadow marinade musicals …..........… …..........… …..........… Myself overshadow.** [...] We disclaim overcame walkway parted musicals illusions—they soared parted illusions overshadow marinade musicals …..........… …..........… …..........… Myself overshadow.
Ok.
**Edgecases**: Input may contain duplicates, [...]
- Rauno Arike 29 Sep 2025 18:45 UTC
  5 points
  0
  Parent
  It seems plausible to me that GPT-5-Thinking is an enhanced version of o3, rather than a completely different model with a separate post-training process. There’s an example in METR’s report where GPT-5 uses the words ‘illusions’ and ‘overshadow’ as well, which strengthens the case for this. Are there strong reasons to think that o3 and GPT-5-Thinking were post-trained completely separately?
  - anaguma 29 Sep 2025 18:56 UTC
    1 point
    0
    Parent
    That seems possible, but GPT-5-Thinking is a better model in many domains, so I’m guessing there was quite a bit of additional training involved.