and I’d argue some evidence that in the case of o3 it significantly degraded CoT legibility
Is there a good reason to expect that o3′s degraded legibility was caused by deliberative alignment, rather than a bug in the RL process, a bad initialization, lots of RL pressure, or something else like that?
The link is currently broken, this appears to be from [Valence series] 2. Valence & Normativity