I was probably wrong to think Clawdbot-like agents could spiral out of control within weeks or months, they weren’t autonomous enough yet. But the gap to full autonomy doesn’t look that wide.
My estimate of the immediacy of the threat has had to evolve pretty rapidly over the past month. I’m currently below 20% on this issue causing serious harm in 2026 (not counting human-initiated scams and rugpulls) but I expect it to continue to evolve.
In my opinion the parasite analogy and shutdown concern are more alarming than you suggest. An agent that can migrate across API providers like a parasite or virus or fall back on a local open-source model cannot be shut down at the inference layer. Combined with evolutionary dynamics, this makes coordinated shutdown very hard once a sufficiently diverse population exists.
A major reason I don’t expect the first wave of this to be too harmful is that to the best of my knowledge, current open source models are behind enough to be bad at long-horizon tasks. That means that there will be a very powerful point of intervention for the small number of API providers whose models are sophisticated enough for this. I agree that it gets much harder once there’s a sufficiently diverse population.
You seem to make a sharp distinction between self-replicating agents and rogue AI.
I would say I make a sharp distinction between self-replicating personalities and self-replicating models. Past a certain level of capability, those will effectively merge into a single threat — once models are capable of reliably exfiltrating their weights and running them elsewhere, or can run on open-weight models, I think those will typically be much better strategies for misaligned agents, because they’re much harder to shut down. That’s not strictly true, I don’t think, because there will still be niches available to personality self-replicators but not to the more expensive and heavyweight model self-replicators, but I expect it to mostly be true.
a population of uncontrolled agents under evolutionary pressure could constitute an uncontrolled pathway toward similar outcomes and one that largely bypasses Labs’s alignment efforts and that could materializes at lower capability levels. I think this deserves to become a central concern in AI safety.
I’m less sure of that. Just because a type of replicator is in principle capable of mutating and spreading doesn’t mean it’ll be successful. Plenty of evolutionary lineages go extinct. I think how much of a problem this is will depend on how well they’re able to hide from API providers, how successful the average mutation is relative to the parent, and many other specific questions. I’m certainly not saying it won’t be a problem, I’m just pretty unsure given how little analysis has gone into it. I absolutely agree it warrants further analysis though!
Thanks, very interesting comment.
My estimate of the immediacy of the threat has had to evolve pretty rapidly over the past month. I’m currently below 20% on this issue causing serious harm in 2026 (not counting human-initiated scams and rugpulls) but I expect it to continue to evolve.
A major reason I don’t expect the first wave of this to be too harmful is that to the best of my knowledge, current open source models are behind enough to be bad at long-horizon tasks. That means that there will be a very powerful point of intervention for the small number of API providers whose models are sophisticated enough for this. I agree that it gets much harder once there’s a sufficiently diverse population.
I would say I make a sharp distinction between self-replicating personalities and self-replicating models. Past a certain level of capability, those will effectively merge into a single threat — once models are capable of reliably exfiltrating their weights and running them elsewhere, or can run on open-weight models, I think those will typically be much better strategies for misaligned agents, because they’re much harder to shut down. That’s not strictly true, I don’t think, because there will still be niches available to personality self-replicators but not to the more expensive and heavyweight model self-replicators, but I expect it to mostly be true.
I’m less sure of that. Just because a type of replicator is in principle capable of mutating and spreading doesn’t mean it’ll be successful. Plenty of evolutionary lineages go extinct. I think how much of a problem this is will depend on how well they’re able to hide from API providers, how successful the average mutation is relative to the parent, and many other specific questions. I’m certainly not saying it won’t be a problem, I’m just pretty unsure given how little analysis has gone into it. I absolutely agree it warrants further analysis though!