we show that a range of frontier models (Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro) can be prompted to “early exit” their CoT and displace reasoning into the response. This undermines the controllability frame: these prompted models retain most of their reasoning capability (4–8pp average accuracy cost vs 20–29pp for no reasoning at all) while moving it into the stylistically controllable channel
I’m sure I’m just missing context, but why are models better at controlling style outside of their CoT? I’m surprised by this, I feel like the distinction between CoT and “reasoning in your output, e.g. via code comments” feels very weak/blurry to me.
Oh right of course, thanks!