According to various Anthropic system cards, including the Claude Opus 4.6 system card, Sonnet 4, Opus 4.1, and Sonnet 4.5 are very poor at bypassing a monitor in SHADE-Arena when the reasoning is visible. However, Opus 4.5 is significantly better at this, and Opus 4.6 is significantly better than Opus 4.5. So it’s possible the controllability trend you’ve observed in recent models has now reversed.
According to various Anthropic system cards, including the Claude Opus 4.6 system card, Sonnet 4, Opus 4.1, and Sonnet 4.5 are very poor at bypassing a monitor in SHADE-Arena when the reasoning is visible. However, Opus 4.5 is significantly better at this, and Opus 4.6 is significantly better than Opus 4.5. So it’s possible the controllability trend you’ve observed in recent models has now reversed.