Adam Karvonen comments on Claude, GPT, and Gemini All Struggle to Evade Monitors

Adam Karvonen 10 Aug 2025 6:29 UTC
3 points
0
I believe this is mistaken (I’m the first author from the paper). For Claude 4 Sonnet, we used the reasoning tokens provided from the API. The other tested models do not provide reasoning tokens, so we instead had the models just output a chain of thought.

I think this could be better communicated in the paper—we only performed this step of analyzing the RL trained reasoning in the Chain of Thought faithfulness section, as most models used in the paper do not provide the reasoning tokens.
- Igor Ivanov 11 Aug 2025 0:25 UTC
  1 point
  0
  Parent
  Thanks for the clarification. It seems I got a bit confused.