A friend runs a startup where they do a lot of RL on COT on a narrow class of tasks, she shares those with me, I can’t share that, sorry. The style is incredibly similar/​immediately pattern-matched though.
I think r1 should also have a similar style in its COT.
The proofs o3 outputs are going to be different from proofs it writes as it thinks about them. This is the style RLed models think in.
A friend runs a startup where they do a lot of RL on COT on a narrow class of tasks, she shares those with me, I can’t share that, sorry. The style is incredibly similar/​immediately pattern-matched though.
I think r1 should also have a similar style in its COT.
The proofs o3 outputs are going to be different from proofs it writes as it thinks about them. This is the style RLed models think in.