Mikhail Samin comments on OpenAI Claims IMO Gold Medal

Mikhail Samin 19 Jul 2025 23:19 UTC
3 points
0
This is false. It is exactly how RLed LLMs write.
- Leon Lang 20 Jul 2025 9:16 UTC
  4 points
  0
  Parent
  Fwiw, I’ve recently used o3 a lot for requesting proofs, and it writes very differently.
  Could you give an example of an RLed LLM that writes like these examples?
  Though I agree with Rauno’s comment that it does look like the chain of thought examples from the Baker et al. paper.
  - Mikhail Samin 20 Jul 2025 10:57 UTC
    6 points
    0
    Parent
    A friend runs a startup where they do a lot of RL on COT on a narrow class of tasks, she shares those with me, I can’t share that, sorry. The style is incredibly similar/immediately pattern-matched though.
    I think r1 should also have a similar style in its COT.
    The proofs o3 outputs are going to be different from proofs it writes as it thinks about them. This is the style RLed models think in.
  - ACCount 20 Jul 2025 10:57 UTC
    3 points
    2
    Parent
    Did you have access to the full o3 reasoning trace, or just the final output? The two are not the same style at all.
    - Leon Lang 20 Jul 2025 11:05 UTC
      4 points
      3
      Parent
      Only the output! I thought Mikhail was referring to the output here, as this is what we see for the IMO problems.
      But as I see it now, the consensus seems to be something like “The chain of thought of new models does look like the IMO problem solutions, and if you don’t train the model to produce final answers that look nice to humans, then they will look like the chain of thought. Probably the experimental model’s answers were not yet trained to look nice”.
      Is this your position? I think that’s pretty plausible.
      - ACCount 20 Jul 2025 12:26 UTC
        1 point
        0
        Parent
        You get the gist. I don’t think I’ve ever seen this specific style, but raw reasoning traces can end up looking even weirder.