faul_sname comments on faul_sname’s Shortform

faul_sname 23 Oct 2025 4:51 UTC

12 points

Does significant RL just make model reasoning chains weird, or is there some other reason Anthropic has quietly stopped showing raw thinking outputs?

Back when extended thinking for Claude Sonnet 3.7 was released, Anthropic showed the full reasoning chain.

As well as giving Claude the ability to think for longer and thus answer tougher questions, we’ve decided to make its thought process visible in raw form.

Then with Claude 4 they introduced reasoning summaries, but said

Finally, we’ve introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full.

On September 18, 2025, Anthropic posted an article Extended Thinking: Differences in Thinking Across Model Versions

The Messages API handles thinking differently across Claude Sonnet 3.7 and Claude 4 models, primarily in redaction and summarization behavior. See the table below for a condensed comparison:

Feature Claude Sonnet 3.7 Claude 4 Models

Thinking Output Returns full thinking output Returns summarized thinking

Interleaved Thinking Not supported Supported with interleaved-thinking-2025-05-14 beta header

Feature	Claude Sonnet 3.7	Claude 4 Models
Thinking Output	Returns full thinking output	Returns summarized thinking
Interleaved Thinking	Not supported	Supported with `interleaved-thinking-2025-05-14` beta header

Zack_M_Davis 23 Oct 2025 6:00 UTC
2 points
0
Parent
The Sonnet 4.5 system card reiterates the “most thought processes are short enough to display in full” claim that you quote:

As with Claude Sonnet 4 and Claude Opus 4, thought processes from Claude Sonnet 4.5 are summarized by an additional, smaller model if they extend beyond a certain point (that is, after this point the “raw” thought process is no longer shown to the user). However, this happens in only a very small minority of cases: the vast majority of thought processes are shown in full.

But it is intriguing that the displayed Claude CoTs are so legible and “non-weird” compared to what we see from DeepSeek and ChatGPT. Is Anthropic using a significantly different (perhaps less RL-heavy) post-training setup?
anaguma 23 Oct 2025 5:27 UTC
1 point
0
Parent
I think not making the CoTs weird is a tax on capabilities and limits the type of research they can do. Also they would need to train the CoTs to not display bad behavior, e.g. not offend the user, which is contra the Most Forbidden Technique because it makes CoT monitoring less useful.