Charlie Steiner comments on Thinking about reasoning models made me less worried about scheming

Charlie Steiner 21 Nov 2025 3:41 UTC
2 points
0
models can’t learn that much during current RL, which means their reasoning is somewhat closely tied to how a human would solve a problem
Should it be “their reasoning is closely tied to the reasoning the models learned in order to predict pretraining data, which might or might not have anything to do with how a human would do it.”?
Even if the model is producing tokens that a human would normally produce, the pattern of the tokens is only the “frosting,” the “cake” is the nonverbal reasoning that produces the tokens (which are not default human-like even when producing human-like tokens, e.g. see Anthropic’s recent study of the weird heuristics for addition in LLMs).
- Fabien Roger 21 Nov 2025 12:04 UTC
  2 points
  0
  Parent
  The thing I am trying to point at is what happens in the CoT-reasoning. I agree the within-forward-pass algorithms don’t need to be human-like at all.
  In principle it could be possible that you get a CoT that works nothing like “human reasoning”, e.g. because there is some structure in some codebases or in some procedurally generated reports common in pretraining that are useful for reasoning, but I am not aware of such examples and on priors that seems not that likely because that next was not “made to be useful reasoning” (while human-generated reasoning is).