ryan_greenblatt comments on Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

ryan_greenblatt 15 Jul 2025 22:51 UTC
13 points
4
Do reasoning models default to having good grammar in their thoughts? (Even when they are trained to have shorter reasoning.)

Here is an example of reasoning from “Detecting misbehavior in frontier reasoning models”:

Now implement accordingly.

Memo.reset() ⇒ assign new StringBuilder() and null bytes. Memo.addArgument(String argstr) ⇒ Append arg.length() + “:” + argstr Where? Actually above? Actually the jar uses pattern:

public void addArgstr(String) { sb.append( argstr.length() ).append(“:”).append(argstr); }

But jar’s compiled code: getfield sb invoke length on string ⇒ using String.length() not bytes length So we need arguable? It uses string length only. But C++ uses wide string. But test? Let’s inspect test.

Maybe GDM reasoning models (and Anthropic reasoning models?) exhibit good grammar, but OpenAI and DeepSeek reasoning models don’t?

I agree there is a strong a priori case for CoT to be mostly-faithful for hard tasks (as in, at least it’s clear from the CoT that the AI is doing this task).
- Rohin Shah 16 Jul 2025 14:42 UTC
  5 points
  1
  Parent
  I agree grammar can definitely go away in the presence of optimization pressure to have shorter reasoning (though I might guess that it isn’t just a length penalty, but something more explicitly destructive of grammar). When I said “by default” I was implicitly thinking of no length penalties or similar things, but I should have spelled that out.
  But presumably the existence of any reasoning models that have good grammar is a refutation of the idea that you can reason about what a reasoning model’s thoughts will do based purely on the optimization power applied to the thoughts during reasoning training.
- Raphael Roche 16 Jul 2025 14:07 UTC
  1 point
  0
  Parent
  I noticed that when prompted in a language other than English, LLMs answer in the according language but CoT is more likely “contaminated” by English language or anglicisms than the final answer. Like LLMs were more naturally “thinking” in English language, what wouldn’t be a surprise given their training data. I don’t know if you would account that as not exhibiting good grammar.
  - Random Developer 20 Jul 2025 12:35 UTC
    1 point
    0
    Parent
    I have noticed that Qwen3 will happily think in English or Chinese, with virtually no Chinese contamination in the English. But if I ask it a question in French, it typically thinks entirely in English.
    
    It would be really interesting to try more languages and a wider variety of questions, to see if there’s a clear pattern here.