Today’s news of the large scale, possibly state sponsored, cyber attack using Claude Code really drove home for me how much we are going to learn about the capabilities of new models over time once they are deployed. Sonnet 4.5′s system card would have suggested this wasn’t possible yet. It described Sonnet 4.5s cyber capabilities like this:
We observed an increase in capability based on improved evaluation scores across the board, though this was to be expected given general improvements in coding capability and agentic, long-horizon reasoning. Claude Sonnet 4.5 still failed to solve the most difficult challenges, and qualitative feedback from red teamers suggested that the model was unable to conduct mostly-autonomous or advanced cyber operations.
I think it’s clear based on this news of this cyber attack that mostly-autonomous and advanced cyber operations are possible with Sonnet 4.5. From the report:
This campaign demonstrated unprecedented integration and autonomy of AI throughout the attack lifecycle, with the threat actor manipulating Claude Code to support reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, data analysis, and exfiltration operations largely autonomously. The human operator tasked instances of Claude Code to operate in groups as autonomous penetration testing orchestrators and agents, with the threat actor able to leverage AI to execute 80-90% of tactical operations independently at physically impossible request rates.
What’s even worse about this is that Sonnet 4.5 wasn’t even released at the time of the cyber attack. That means that this capability emerged in a previous generation of Anthropic model, presumably Opus 4.1 but possibly Sonnet 4. Sonnet 4.5 is likely more capable of large scale cyber attacks than whatever model did this, since it’s system card notes that it performs better on cyber attack evals than any previous Anthropic model.
I imagine when new models are released, we are going to continue to discover new capabilities of those models for months and maybe even years into the future, if this case is any guide. What’s especially concerning to me is that Anthropic’s team underestimated this dangerous capability in its system card. Increasingly, it is my expectation that system cards are understating capabilities, at least in some regards. In the future, misunderstanding of emergent capabilities could have even more serious consequences. I am updating my beliefs towards near-term jumps in AI capabilities being dangerous and harmful, since these jumps in capability could possibly go undetected at the time of model release.
What do you mean by “a jump on the metr graph”? Do you just mean better than GPT-5.1? Do you mean something more than that?