Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 7 Feb 2025 21:18 UTC
5 points
2
r1’s reasoning feels conversational. Messy, high error rate, often needs to backtrack. Stream of thought consciousness rambling.
Other models’ reasoning feels like writing. Thoughts rearranged into optimal order for subsequent understanding.
In some sense you expect that doing SFT or RLHF with a bunch of high quality writing makes models do the latter and not the former.
Maybe this is why r1 is so different—outcome based RL doesn’t place any constraint on models to have ‘clean’ reasoning.
- eggsyntax 10 Feb 2025 18:09 UTC
  8 points
  0
  Parent
  What models are you comparing to, though? For o1/o3 you’re just getting a summary, so I’d expect those to be more structured/understandable whether or not the raw reasoning is.
  - cubefox 11 Feb 2025 16:41 UTC
    3 points
    1
    Parent
    Yeah. Apart from DeepSeek-R1, the only other major model which shows its reasoning process verbatim is “Gemini 2.0 Flash Thinking Experimental”. A comparison between the CoT traces of those two would be interesting.