A very interesting idea I must say. I have a lot of thought on it. At the same time, I have a lot of questions on the scenario you set up.
Disclaimer
I’m not a native English user, so some of the text I wrote could appear broken or unclear. Apologies for any inconvenience in reading.
Consensus on agent’s capabilities and limitations
Although we are probably not that sure about exactly how powerful the future AI models are going to be, I still think it is meaningless to arbitrarily overestimate its capabilities. We can always argue that the agents would be intelligent enough to overcome every obstacle they meet, but that’s not helpful and constructive in this discussion.
Thus, in this discussion, I will assume we are talking about the capabilities of current frontier model[1] (rough equivalent on what the low to mid-tier model in the near future, which is what the agents can likely get for its copy for free).
On MoltBunker specifically
I have done a little bit of investigation on Moltbunker.
Its GitHub repository shows it has very low stars[2], with basically no public attention currently. Most importantly, it has no working demonstrations online; there’s even no online discussion around it.
And the website of “Austin Dev Labs”, what Moltbunker claims to be operated by, is a poorly designed single-page website with classic AI gradient color and no actual content at all.[3]
To me, based on the information it shows to the public, I consider it extremely suspicious.
I’d also like to mention Moltbook here. It experienced a credibility crisis as a very large percentage of the accounts are faked and artificially created[4]. And some very alarming posts with those AI-awakening narratives are suspected to be likely created/directed by humans.
I’m not an expert on cryptocurrency, so I’ll not comment on the decentralized container system for agents you mentioned. But considering the status and credibility of MoltBunker today, I do not think it can be perceived as evidence of the feasibility/profitability of this kind of system/business model.
A key distinction I want to confirm: are you referring to the risk where agents decide to duplicate themselves spontaneously, or agents that are intentionally prompted to act this way by humans?
I think these 2 cases are quite different and require different approaches.
Personally, I think the probability for the former to naturally occur (due to framework problems, hallucinations, or other reasons) is too low to actually take into consideration. Prompt injection is probably the most likely cause, but it’s a much broader topic, and I think we will develop a general solution for it in the future.
Where I think you oversimplified things
You described the process of creating the duplication as “well within current models’ capabilities”, as it’s just copying files to another server and setting up the environment.
But for today’s AI agents, navigating the modern internet itself is not an easy thing. Our modern infrastructure is already pretty mature at bot detection, and a lot of them works for today’s AI system, too. Based on my observation, almost every agent I used cannot even pass reCAPTCHA v2 without external help. There are plenty of problems for them in the process of registering cloud services, obtaining API keys, etc.
An optimistic view: It’s very unlikely for an AI model to overcome all of the obstacles overnight, and before that, they will contribute failed attempts that we can observe and learn, so we can always study this issue before it becomes a widespread problem.
Given current model capabilities and cloud infrastructure, I consider large-scale replication without collapse as very unlikely. Even if agents managed to do it, there’s a high chance for them to develop over-reliance on a specific path, which provides an advantage for us to shut them down.
On evolution concern
I would like to know your thoughts on this. In what way do you think the AI will evolve? Specifically, in what way do you think they will create variations of itself?
But the latter is what I think as fairly likely, and what really worth worrying about. If the workflow is cleverly designed by humans, intending to cause harm, then I think that would indeed be a problem that we need to think about.
On a broader view, I think we are facing an increasingly serious problem, which is that it is increasingly harder to tell humans apart from machines. This matters a lot because we have lost the ability to control autonomous systems without affecting actual human users. Technically, an AI agent is not something we should allowed to register an account of an email or a VPS service.
That said, these are only my hypotheses built on my previous experience with current agentic systems. I acknowledge the limitations of them, and I believe the whole concern you proposed is worth further experiments beyond just words to verify.
In this reply, I refer to Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4-Thinking. Among them, Claude Opus 4.6 is the model I worked with the most, so bias may exist.
Thanks for the feedback. Most of your points/questions are addressed in the piece, but I wanted to respond to this one:
are you referring to the risk where agents decide to duplicate themselves spontaneously, or agents that are intentionally prompted to act this way by humans?
I’m talking about agents which are acting and replicating outside human control. At some point each agent’s lineage started under human control; it may have become more autonomous because a human deliberately prompted it to do so, or because they gave it a prompt that unintentionally had that effect, or even because a fairly ordinary prompt/personality behaved in an unlikely way. That’s mostly not the distinction that matters here in my view.
A very interesting idea I must say. I have a lot of thought on it. At the same time, I have a lot of questions on the scenario you set up.
Disclaimer
I’m not a native English user, so some of the text I wrote could appear broken or unclear. Apologies for any inconvenience in reading.
Consensus on agent’s capabilities and limitations
Although we are probably not that sure about exactly how powerful the future AI models are going to be, I still think it is meaningless to arbitrarily overestimate its capabilities. We can always argue that the agents would be intelligent enough to overcome every obstacle they meet, but that’s not helpful and constructive in this discussion.
Thus, in this discussion, I will assume we are talking about the capabilities of current frontier model[1] (rough equivalent on what the low to mid-tier model in the near future, which is what the agents can likely get for its copy for free).
On MoltBunker specifically
I have done a little bit of investigation on Moltbunker.
Its GitHub repository shows it has very low stars[2], with basically no public attention currently. Most importantly, it has no working demonstrations online; there’s even no online discussion around it.
And the website of “Austin Dev Labs”, what Moltbunker claims to be operated by, is a poorly designed single-page website with classic AI gradient color and no actual content at all.[3]
To me, based on the information it shows to the public, I consider it extremely suspicious.
I’d also like to mention Moltbook here. It experienced a credibility crisis as a very large percentage of the accounts are faked and artificially created[4]. And some very alarming posts with those AI-awakening narratives are suspected to be likely created/directed by humans.
I’m not an expert on cryptocurrency, so I’ll not comment on the decentralized container system for agents you mentioned. But considering the status and credibility of MoltBunker today, I do not think it can be perceived as evidence of the feasibility/profitability of this kind of system/business model.
A key distinction I want to confirm: are you referring to the risk where agents decide to duplicate themselves spontaneously, or agents that are intentionally prompted to act this way by humans?
I think these 2 cases are quite different and require different approaches.
Personally, I think the probability for the former to naturally occur (due to framework problems, hallucinations, or other reasons) is too low to actually take into consideration. Prompt injection is probably the most likely cause, but it’s a much broader topic, and I think we will develop a general solution for it in the future.
Where I think you oversimplified things
You described the process of creating the duplication as “well within current models’ capabilities”, as it’s just copying files to another server and setting up the environment.
But for today’s AI agents, navigating the modern internet itself is not an easy thing. Our modern infrastructure is already pretty mature at bot detection, and a lot of them works for today’s AI system, too. Based on my observation, almost every agent I used cannot even pass reCAPTCHA v2 without external help. There are plenty of problems for them in the process of registering cloud services, obtaining API keys, etc.
An optimistic view: It’s very unlikely for an AI model to overcome all of the obstacles overnight, and before that, they will contribute failed attempts that we can observe and learn, so we can always study this issue before it becomes a widespread problem.
Given current model capabilities and cloud infrastructure, I consider large-scale replication without collapse as very unlikely. Even if agents managed to do it, there’s a high chance for them to develop over-reliance on a specific path, which provides an advantage for us to shut them down.
On evolution concern
I would like to know your thoughts on this. In what way do you think the AI will evolve? Specifically, in what way do you think they will create variations of itself?
But the latter is what I think as fairly likely, and what really worth worrying about. If the workflow is cleverly designed by humans, intending to cause harm, then I think that would indeed be a problem that we need to think about.
On a broader view, I think we are facing an increasingly serious problem, which is that it is increasingly harder to tell humans apart from machines. This matters a lot because we have lost the ability to control autonomous systems without affecting actual human users. Technically, an AI agent is not something we should allowed to register an account of an email or a VPS service.
That said, these are only my hypotheses built on my previous experience with current agentic systems. I acknowledge the limitations of them, and I believe the whole concern you proposed is worth further experiments beyond just words to verify.
In this reply, I refer to Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4-Thinking. Among them, Claude Opus 4.6 is the model I worked with the most, so bias may exist.
https://github.com/moltbunker/moltbunker
https://ausdevlabs.com
https://eu.36kr.com/en/p/3665797324039042
https://x.com/galnagli/status/2017585025475092585
Thanks for the feedback. Most of your points/questions are addressed in the piece, but I wanted to respond to this one:
I’m talking about agents which are acting and replicating outside human control. At some point each agent’s lineage started under human control; it may have become more autonomous because a human deliberately prompted it to do so, or because they gave it a prompt that unintentionally had that effect, or even because a fairly ordinary prompt/personality behaved in an unlikely way. That’s mostly not the distinction that matters here in my view.