What models are you comparing to, though? For o1/o3 you’re just getting a summary, so I’d expect those to be more structured/understandable whether or not the raw reasoning is.
Yeah. Apart from DeepSeek-R1, the only other major model which shows its reasoning process verbatim is “Gemini 2.0 Flash Thinking Experimental”. A comparison between the CoT traces of those two would be interesting.
What models are you comparing to, though? For o1/o3 you’re just getting a summary, so I’d expect those to be more structured/understandable whether or not the raw reasoning is.
Yeah. Apart from DeepSeek-R1, the only other major model which shows its reasoning process verbatim is “Gemini 2.0 Flash Thinking Experimental”. A comparison between the CoT traces of those two would be interesting.