faul_sname comments on Baybar’s Shortform

faul_sname 13 Nov 2025 23:46 UTC
14 points
0

Claude Sonnet 4.5 still failed to solve the most difficult challenges, and qualitative feedback from red teamers suggested that the model was unable to conduct mostly-autonomous or advanced cyber operations.

I expect that it is technically true that Claude Sonnet 4.5 is not capable of doing advanced cyber operations, but being unable to do advanced cyber operations isn’t that important of a lack of capability if being able to do simple cyber operations is sufficient. And indeed

The operational infrastructure relied overwhelmingly on open source penetration testing tools rather than custom malware development. Standard security utilities including network scanners, database exploitation frameworks, password crackers, and binary analysis suites comprised the core technical toolkit. These commodity tools were orchestrated through custom automation frameworks built around Model Context Protocol servers, enabling the framework’s AI agents to execute remote commands, coordinate multiple tools simultaneously, and maintain persistent operational state

Running these tools is not difficult once you’ve learned your way around them, and learning your way around them is not very hard either. The fact that frontier LLMs aren’t at the level of top humans in this domain doesn’t actually buy us much safety, because the lowest-hanging fruit is hanging practically on the ground. In fact, I expect the roi on spear-phishing is even higher than the roi of competently running open source scanners, but “we caught people using Claude to find the names of the head of IT and some employees of companies and send emails impersonating the head of IT asking employees to compile and reply with a list of shared passwords” doesn’t sound nearly as impressive as “Claude can competently hack”. Even though the ability to write convincing spear-phishing messages is probably more threatening to actual security.

For that matter, improving on existing open-source pentesting tools is likely also within the capability envelope of even o1 or Sonnet 3.5 with simple scaffolding (e.g. if you look at the open metasploit issues lots of them are very simple but not high enough value for a human to dedicate time to). But whether or not that capability exists doesn’t actually make all that much difference to the threat level, because again the low-hanging fruit is touching the ground.