Only the output! I thought Mikhail was referring to the output here, as this is what we see for the IMO problems.
But as I see it now, the consensus seems to be something like “The chain of thought of new models does look like the IMO problem solutions, and if you don’t train the model to produce final answers that look nice to humans, then they will look like the chain of thought. Probably the experimental model’s answers were not yet trained to look nice”.
Is this your position? I think that’s pretty plausible.
Only the output! I thought Mikhail was referring to the output here, as this is what we see for the IMO problems.
But as I see it now, the consensus seems to be something like “The chain of thought of new models does look like the IMO problem solutions, and if you don’t train the model to produce final answers that look nice to humans, then they will look like the chain of thought. Probably the experimental model’s answers were not yet trained to look nice”.
Is this your position? I think that’s pretty plausible.
You get the gist. I don’t think I’ve ever seen this specific style, but raw reasoning traces can end up looking even weirder.