The Anthropic model cards often reference “helpful-only” versions of Claude that sound like those versions exist late into development, e.g. the Opus 4.6 system card:
1.2.2 Iterative model evaluations
We conducted evaluations throughout the training process to better understand how catastrophic risk-related capabilities evolved over time. We tested multiple different model snapshots (that is, models from various points throughout the training process): ● Multiple “helpful, honest, and harmless” snapshots for Claude Opus 4.6 (i.e. models that underwent broad safety training); ● Multiple “helpful-only” snapshots for Claude Opus 4.6 (i.e. models where safeguards and other harmlessness training were removed); and ● The final release candidate for the model. For agentic evaluations we sampled from each model snapshot multiple times.
The Anthropic model cards often reference “helpful-only” versions of Claude that sound like those versions exist late into development, e.g. the Opus 4.6 system card: