Fabien Roger comments on Reasoning Models Sometimes Output Illegible Chains of Thought

Fabien Roger 26 Nov 2025 4:16 UTC
4 points
2
My guess is that even with this change, you might still get spurious drops in performance because of the OODness of removing / truncating content. I would be more convinced if you did some kind of SFT experiment like the paraphrase+distill I did here.
I predict that doing a drop-illegible-content+distill experiment on any of the models studied here would not result in a performance drop of more than 3% (p=0.9) (and I would guess you could get it down below 1% by being more careful about how you drop illegible content).
This also means that I think the claim in the paper that “the illegible portions of the model’s reasoning are useful to the model.” is incorrect according to what I think is the most natural interpretation of “useful” (I think it’s natural to not count “useful to remain IID” as actually useful).
- Jozdien 26 Nov 2025 16:38 UTC
  4 points
  0
  Parent
  I really wanted to try the paraphrase+distill idea on this, but in the end it was sitting in my drafts for months because I never got around to it (and other ideas), so I decided against it. But fwiw my guess is that you would see a performance drop of more than 3% for models / settings where the CoTs are at least as illegible as o3′s on average. Certainly less than the drop I show here though. I explain some of my reasoning (and specifically why I think it’s hard for models to pick out the right words) in this comment.