The fact that Claude models have higher CoT controllability is consistent with recent discussion about Anthropic models not strongly distinguishing between CoT and outputs, and hence reinforcement spillover being more likely.
(Although it strikes me now that the causality between reinforcement spillover and not strongly distinguishing between CoT and outputs could go in either direction).
That’s interesting! But I should note that the more recent Claude models have lower CoT controllability. I hypothesize is more like due to (1) more RLVR steps, as we shown that will decrease CoT controllability, and (2) potentially earlier Claude models like Claude sonnet 3.7 were put under optimization pressure on their CoT, which is why its controllability is so high, but the more recent models didn’t get put under pressure on CoT. Again, these are my hypotheses.
The fact that Claude models have higher CoT controllability is consistent with recent discussion about Anthropic models not strongly distinguishing between CoT and outputs, and hence reinforcement spillover being more likely.
(Although it strikes me now that the causality between reinforcement spillover and not strongly distinguishing between CoT and outputs could go in either direction).
That’s interesting! But I should note that the more recent Claude models have lower CoT controllability. I hypothesize is more like due to (1) more RLVR steps, as we shown that will decrease CoT controllability, and (2) potentially earlier Claude models like Claude sonnet 3.7 were put under optimization pressure on their CoT, which is why its controllability is so high, but the more recent models didn’t get put under pressure on CoT. Again, these are my hypotheses.