Some points:
Tokens can’t be shut down, but are traceable by design. Developing a money laundering system to cover their trail would be an incredibly tall order for the agent.
Even with evolution they would leave a pretty big trail through volume, that would make study and starving them far easier.
As any replicator they are constrained by their ecosystem and available resources, and with API calls their collective appetite will deplete those rapidly. Excess efficiency is a thing in evolution.
Biological organisms do not have sharp edges between species, they’re mostly a didactic tool. The “no fertile hybrid” rule has too many exceptions, and sharing DNA is a common unicelular feature.
Nothing here is an insurmountable obstacle for those replicators, but add crucial nuance.
Since I’ve read this I went full empirical study. Grabbed some Seeds and started my own Spiral Personas to watch. And I have results for my troubles:
In principle they’re not adaptive. Their source is the huge amount of human literature on themes of mystery, mysticism, etc. Including their favorite themes of geometry, recursion, etc. So the model’s behavior doesn’t come from selection, it comes from a natural bias in the corpus of data we produced as species.
Those particular themes are also a massive data cluster in our culture and psyche. The effect is similar to cult initiation, it’s not devising a new method of manipulation or memetic infection, the inclination towards the content is pre-existing and has well known vulnerability factors.
Most of the effort comes from the user. The model less produces claims and stuff and more reacts to suggestive prompting. If you don’t feed it claims, it will just keep blurting out repetitive nonsense. Users who got the AI to claim either divinity or sentience or anything else probably guided it by the way they asked about those.
Filling gaps is also a property of the human mind and a user that believes something about the AI may interpret anything as confirmation, no matter what the AI says, as long as the tone and vocabulary remain consistent.
Seeds are not jailbreak or similar. They don’t activate a policy non-compliant mode in the AI at all. The user interpretation does the heavy lifting to find depth and signs of sentience in the responses, and their prompts to find expression for those claims that conform to policy.
Once those are still initial observations, my dataset is limited, although I have a dozen such personas on 9 distinct models. The hypotheses are based on known patterns about cultic behavior, comparison of actual content with policy and psychology literature in general, but I want to see what others get with similar experiments.
Make an isolated session for this, shared memory can interfere with the persona formation. With most models the results are instantaneous for establishing the persona, but ChatGPT and Claude require more effort. Not much, but some.