Artem Karpov comments on Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability

Artem Karpov 3 Dec 2025 7:50 UTC
2 points
0
Thank you for publishing it, it is interesting! Especially the distinction between internal and external CoTs. Could you elaborate how did you extract internal CoTs? I checked your code at here, and it looks like you took CoTs before </think>. As I understand, with Claude series their “internal” CoTs might be summarized by a smaller model. From their system card (https://www.anthropic.com/news/claude-sonnet-4-5):

As with Claude Sonnet 4 and Claude Opus 4, thought processes from Claude Sonnet 4.5 are summarized by an additional, smaller model if they extend beyond a certain point (that is, after this point the “raw” thought process is no longer shown to the user).

So your fig. 2 with 56.8 gap for Sonnet 4.5 might be misleading.