OAI models rely more on CoT for their capabilities. E.g. their benchmark scores with and without CoT are more different.
Anthropic models treat their CoT less differently from their output than OAI models do. This means that RL probably pressures their CoT more. See here.
OAI models rely more on CoT for their capabilities. E.g. their benchmark scores with and without CoT are more different.
Anthropic models treat their CoT less differently from their output than OAI models do. This means that RL probably pressures their CoT more. See here.