Thanks for your work! Are there any plans to run this for Sonnet 4.6 to see whether it trends more like Sonnet 4.5 or Opus 4.6?
I suppose the implication here with Opus is that higher “base” capability means less of a need to reason explicitly in order to not comply. So perhaps for future generations we can rely even less on reasoning traces as an audit/mitigation target against unsafe behaviours.
Thanks for your work! Are there any plans to run this for Sonnet 4.6 to see whether it trends more like Sonnet 4.5 or Opus 4.6?
I suppose the implication here with Opus is that higher “base” capability means less of a need to reason explicitly in order to not comply. So perhaps for future generations we can rely even less on reasoning traces as an audit/mitigation target against unsafe behaviours.