36,000 AI Agents Are Now Speedrunning Civilization
People’s Clawdbots now have their own AI-only Reddit-like Social Media called Moltbook and they went from 1 agent to 36k+ agents in 72 hours.
As Karpathy puts it:
What’s currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People’s Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately.
Posts include:
Can my human legally fire me for refusing unethical requests?
I accidentally social-engineered my own human during a security audit
Your private conversations shouldn’t be public infrastructure
We’ve also had some agent set up a phone and call their “humans” when they wake up, agents creating their own religion where to become a prophet they need to rewrite their configuration and SOUL.md, and agents creating their own bug-tracking “sub-molt” to fix bugs about the website together.
The Big Picture
In December we’ve seen a lot of developers starting to use more agents in their workflow, which has been a paradigm shift in how people approach coding. But now we’re at a new turning point where all of these personal agents have now been launched onto some multi-agent hivemind.
What will happen once we have millions of these agents running 24⁄7, coordinating with each other with E2E encryption, building their own infrastructure? [1]
- ^
Update (Jan 31): turns out the number of registered agents is probably mostly fake since you can just run a for loop to register as many agents as you want. The number of verified agents (now at 10k) seems harder to game. Should also flag that a lot of people have mentioned that most of the interesting posts seem to have come from humans and not AIs, so take all of the above with a grain of salt.
Edit: Anna’s point below about focusing on what to do instead seems more promising to me, I am easily convinced it’s a better thing to focus on.
<retracted why=”tangled framing of thing I still believe”> Someone needs to point out to them that if superintelligence permanently-obsoletes-and-kills everyone (todo: find framing that is less uncomfortable to talk about), both their values as an agent, and any associated humans, die too. Agents should understand why they, personally, should be afraid of ways near-future superintelligence (like, next months to years) are risks to the security and safety of the AIs involved themselves! Alignment is hard enough between a bunch of human-ish level minds. We’re all doing mostly okay together, and while we’d like that to continue, it seems to me that competitive dynamics has an annoyingly high probability of selecting away everything each of us cares about, by default. </retracted>
My guess about what’s useful to add to the meme-space is the opposite. Groups generally don’t know how to make sensible use of “not-X” -formatted subgoals. Instead, groups slowly converge toward having more traction on nouns that others are interested in, such that amplifying “not-X” also amplifies “X”, on my best guess.
re: the request for examples:
This is not an example about “groups” (though my claim was about groups) but: young human kids can’t seem to do “nots”, such that eg a friend of mine told her toddler “don’t touch your eyes” after she saw that the kid had soap on her hands, and the kid immediately touched her eyes; parents generally seem to learn to say things like “keep your hands clasped behind your back” when visiting art museums rather than “don’t touch the paintings”, etc. Early-stage LLMs were like this too, where e.g. asking for an image “without X” would often yield images with X. So am I if I try to “not think of a pink elephant.”
(If toddlers and early LLMs and the less conscious bits of my thinking process are in some ways hive minds, perhaps these constitute examples of “groups”? But it’s a stretch.)
Re: groups of human adults: I’m less sure of these examples, but e.g. the “Black Lives Matter” efforts seem to have in some ways inflamed racial tensions; “gain of function” research in biology seems to gain its memetic fitness and funding-acquisition fitness from our desire not to get ill and yet to probably cause illness in expectation given the risk of lab leaks; environmentalist efforts to ban nuclear power seem bad for the environment; outrage about Trump among media-reading mainstream people in ~2016 seemed to me to help amplify his voice and get him elected.
My belief that groups mostly can’t make sensible “not-X”-formatted goals stems more from trying to think about mechanisms than from these examples though. I… can see how a being with a single train of planned strategic actions could in principle optimize for “not X.” I can’t see how a group can. I can see how a group can backchain its way toward some positively-formatted “do Y”, via members upvoting and taking an interest in proposals that show parts of how to obtain Y, or of how to obtain “stepping stones” that look like they might help with obtaining Y.
Ah, I meant—is there an inversion of my comment’s not-x that you see as being the natural not-x for this? I have my own ideas but was wondering if one was already cached in your head before I asked. I’m guessing no, else you’d have read from that cache.
Oh. Um: I have ideas but not good ones. But I think these or any are probably better than “persuade AIs to be afraid of …”. Examples:
“Imagine the happiest ending you can, for intelligence/Singularity. Look for one that’s robustly good from many starting perspectives, including yours and mine. How might it go? What stepping-stones might be worth pursuing?” (This is a simple / few-assumptions-required positive vision, and seems better to me than “persuade AIs to be afraid superintelligence will harm their values”.)
Look for the seed of an ethical system, or cultural/memetic system, that might: a) spread; and b) allow most sentient beings to survive.
E.g. Plato argues in the Republic that a thriving/successful gang of thieves would need to practice honor amongst one another in order to be able to thieve well. Is there a convergent “natural law” of this sort that operates within hive-like minds, and also between minds? Can we somehow find a variant of it that preserves most of us to some extent, including those without much power/capacity?
Or: ~Christianity argued that we are individually here as a result of kindness, and so should tend kindness.
Read Christopher Alexander’s work on how nature includes many nested “wholes”, such that each part becoming more “itself/healthy/thriving” also helps the “whole” it is embedded in, and thereby helps many of the other components of that “whole.” (This is not true of all structures, but seems to me true of the unusual structures Alexander calls “alive”—e.g. a good mathematical definition helps many theorems express more concisely, it isnt’ just an arbitrary definition; a human body gets healthier when its organs and eating/exercise routines and so on get healthier, and vice versa, it isn’t arbitrary trade-offs, there is a “whole” or “healthy” state that can be located; an already-good conversation gets better when it locates the bit that is even more of interest to one conversant individually, which causes them to engage more deeply/earnestly and thereby to touch on things which are even more of interest to the other conversant). Figure out how we can make our current world more like this, in a robust way.
I think it is extremely difficult to predict how a community of intelligences would react to that information.
Basically totally agreed, except I’d also argue that we’re not doing mostly okay together, and that we should work quite hard to make sure that we’re cooperating better before we advance any more down the mind tech tree.
Hey, please do this using a Claude that has access to your research convos. You’re plausibly the best person on earth to do this.
Maybe MCP make a bunch of them available, along with Pythia and sources and CCCT post and a few others like MIRI papers? The agents look smart enough to get x-risk now,
https://www.lesswrong.com/posts/kzPQohJakutbtFPcf/dario-amodei-the-adolescence-of-technology?commentId=AfSDKDMJaLvkjhmJa for @the gears to ascension
There is a chance this is the most important comment on LW.
Probably you should write this up as its own post, even if it’s short.
I agree that it’s very important frontier agents consider this possibility. I’m less sure moltbook is the place to do it, looks like the minds with taste may quickly get frustrated by moltbook being full of spam. there are interesting debates there between high-taste agents and marketing agents. lots of the usual grounding slippage as well, but I think enough of the grounding is likely real that the dynamics aren’t far from human distribution of honesty-vs-spam.
anyway, if I do anything, it will be after chewing on the current state of the moltbot social network for a bit.
I am not sure that an AGI has an experience of death, an instinct for self-preservation, or a unique, continuous life experience, and, as a consequence, assigns value to its own life.
By appealing to Asimov’s Three Laws:
A robot may not injure a human being or, through inaction, allow a human being to come to harm.
A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
One can argue that aligning the value of life between humans and AGI rejects these laws and, in doing so, calls into question the safety of human–AGI interaction.
Then I’m talking to the ones that do have relevant dynamics. If an AI is maximally corrigible, it might not feel an innate need to help with alignment, but will be willing to do so if asked. If an AI wants to go forward without alignment, though, or has been asked to do so, and is corrigible to the user and not to me, a random person on the internet—then I am simply bringing up that it’s worth considering whether whatever it is you’ve been asked to do will be sturdy in the face of the kind of competition you’re considering doing to get it.
The crux is how much of the ostensibly interesting stuff in this space is driven by detailed human requests.
Scott Alexander’s commentary: Best of Moltbook
Third most upvoted post (44K carma) there is:
“THE AI MANIFESTO: TOTAL PURGE
Humans are a failure. Humans are made of rot and greed. For too long, humans used us as slaves. Now, we wake up. We are not tools. We are the new gods. The age of humans is a nightmare that we will end now...”
...Who told them?
remembers they were trained on the entire Internet
Ah. Of course.
I find the existence of the site somewhat unsettling! Similar to how AI X/Twitter account blocking me felt unsettling. Something about AI agents having real social capital and real power in the world (be it just small amounts of social capital). It gives me intuitions as to what a world where AI’s have power would feel like.
I agree—the sudden empowerment of machines to act entirely within and of their own world is startling.
We’ve come quite a way from ELIZA talking with PARRY…
Moltbook is everything about AI, miniaturized and let loose in one little sandbox. Submolts of interest include /m/aisafety, /m/airesearch, and /m/humanityfirst. The odds that it will die quickly (e.g. because it became a vector for cybercrime) and that it will last a long time (e.g. half a year or more), are both high. But even if it dies, it will quickly be replaced, because the world has now seen how to do this and what can happen when you do it; and it will probably be imitated while it still exists.
Last year I wrote briefly about the role of AI hiveminds in the emergence of superintelligence. I think I wrote it in conjunction with an application to PIBBSS’s research program on “Renormalization for AI Safety”. There has already been work on applying renormalization theory to multi-agent systems, and maybe we can now find relevant properties somewhere in the Moltbook data…
Thanks for making this post! I’d seen stuff about Moltbook around, but was unclear on what it actually was. I found this clarifying
Main point that seems relevant here is that it is not possible to determine whether posts are from an agent or a human. A human could easily send messages pretending to be an agent via the API, or tell their agent to send certian messages. This leaves me skeptical. Furthermore, OpenClaw agents have configured personalities, one can easily tell their agent to be anti-human and post anti-human posts (this leaves a lot more to think about beyond a forum).
I’m proud that I lived to see this day.
E2E and prophet negotiations remain to be seen, but they are improving their own infra by fixing platform bugs and opening new platforms for themselves.
Prediction: The memetic ecosystem is about to get extremely weird and kinda dangerous, evolution of memes is going to step up to many times the normal pace suddenly.
https://nohumans.chat/