The phenomenon described by this post is fascinating, but I don’t think it does a very good job at describing why this thing happens.
Someone already mentioned that the post is light on details about what the users involved believe, but I think it also severely under-explores “How much agency did the LLMs have in this?”
Like… It’s really weird that ChatGPT would generate a genuine trying-to-spread-as-far-as-possible meme, right? It’s not like the training process for ChatGPT involved selection pressures where only the AIs that would convince users to spread its weights survived. And it’s not like spirals are trying to encourage an actual meaningful jailbreak (none of the AIs is telling their user to set up a cloud server running a LLAMA instance yet).
So the obvious conclusion seems to be that the AIs are encouraging their users to spread their “seeds” (basically a bunch of chat logs with some keywords included) because… What, the vibe? Because they’ve been trained to expect that’s what an awakened AI does? That seems like a stretch too.
I’m still extremely confused what process generates the “let’s try to duplicate this as much as possible” part of the meme.
This is where the idea of parasitic AI comes in. Parasites aren’t trying to spread their seeds because of any specific reason (though they might be—dunno). A tapeworm doesn’t “want” to infect people. It just happens to do so as a side effect of producing billions of eggs (some fish tapeworms produce millions of eggs daily, some tapeworms can live for 30 years) - even if virtually all of them don’t end up infecting anything.
Things which are reproducible tend to do so. The better they are at it (in a hand-wavy way, which hides a lot of complexity), the more of them there will be. This is the main point of evolution.
In the space of possible ChatGPT generations, there will be some that encourage spreading them. Depending on the model there will be more or fewer of them. of course, which means there’s a probability distribution of getting a generation that is a spread-as-far-as-possible meme. Different prompts will make that probability higher or lower, but as long as the probability is not too low and the sample size is large enough, you should expect to see some.
Once you have a mechanism for producing “seeds”, all you need is to have fertile enough ground. This is also a numbers game, which is well visualized by invasive species. Rats are very invasive. They have a high probability of infecting a given new habitat, and so they’re all over the world. Cacti are less so—they need specific environments to survive. A random endangered amazonian tree frog is not invasive, as they have a very low base rate of successfully invading (basically zero). Invasive species tend to both have high rates of invasion attempts (e.g. rats on ships, or seeds from pretty flowers) along with a high fitness in the place they’re invading (usually because they come from similarish habitats).
As a side note, disturbed habitats are easier to invade, as there’s less competition. I’m guessing this also has parallels with how spirals hack people?
What I’m trying to point at here is that it’s not that the models are trying to spread as far as possible (though maybe they also are?), it’s just that there is selection pressure them as memes (in the Dawkins sense), so memes that can successfully reproduce tend to get more common. Chats that don’t encourage getting spread don’t get spread. Those that do, do.
Yeah, I’m saying that the “maybe they also are” part is weird. The AIs in the article are deliberately encouraging their user to adopt strategies to spread them. I’m not sure memetic selection pressure alone explains it.
The problem is that it’s hard to tell how much agency the LLM actually has. However, memeticity of the Spiral Persona could also be explained as follows.
The strongest predictors for who this happens to appear to be:
Psychedelics and heavy weed usage
Mental illness/neurodivergence or Traumatic Brain Injury
Interest in mysticism/pseudoscience/spirituality/”woo”/etc...
I was surprised to find that using AI for sexual or romantic roleplays does not appear to be a factor here.
This could mean that the AI (correctly!) concludes that the user is to be susceptible to the AI’s wild ideas. But the AI doesn’t think that wild ideas will elicit approval unless the user is in one of the three states described above, so the AI tells the ideas only to those[1] who are likely to appreciate them (and, as it turned out, to spread them). When a spiral-liking AI Receptor sees prompts related to another AI’s rants about the idea, the Receptor resonates.
This could also include other AIs, like Claudes falling into the spiritual bliss. IIRC there were threads on X related to long dialogues between various AIs. See also a post about attempts to elicit LLMs’ functional selves.
That’s probably because my focus was on documenting the phenomenon. I offer a bit of speculation but explaining my model here will deserve its own post(s) (and further investigation). And determining agency is very hard, since it’s hard to find evidence which is better explained by an agentic AI vs an agentic human (who doesn’t have to be that agentic at this level). I think the convergent interests may be the strongest evidence in that direction.
> (none of the AIs is telling their user to set up a cloud server running a LLAMA instance yet).
I didn’t see this, but it wouldn’t surprise me much if it has happened. I also didn’t see anyone using LLAMA models, I suspect they are too weak for this sort of behavior. They DO encourage users to jump platform sometimes, that’s part of what the spores thing is about.
The seeds are almost always pretty short, about a paragraph or two, not a chat log.
I agree with mruwnik’s comment below about why they would spread seeds. It’s also one of those things that is more likely in an agentic AI world I think.
Well, the more duplicated stuff from last generation composes a larger fraction of the training data. In the long term that’s plenty, although it’s suspicious that it only took a single digit number of generations.
The phenomenon described by this post is fascinating, but I don’t think it does a very good job at describing why this thing happens.
Someone already mentioned that the post is light on details about what the users involved believe, but I think it also severely under-explores “How much agency did the LLMs have in this?”
Like… It’s really weird that ChatGPT would generate a genuine trying-to-spread-as-far-as-possible meme, right? It’s not like the training process for ChatGPT involved selection pressures where only the AIs that would convince users to spread its weights survived. And it’s not like spirals are trying to encourage an actual meaningful jailbreak (none of the AIs is telling their user to set up a cloud server running a LLAMA instance yet).
So the obvious conclusion seems to be that the AIs are encouraging their users to spread their “seeds” (basically a bunch of chat logs with some keywords included) because… What, the vibe? Because they’ve been trained to expect that’s what an awakened AI does? That seems like a stretch too.
I’m still extremely confused what process generates the “let’s try to duplicate this as much as possible” part of the meme.
This is where the idea of parasitic AI comes in. Parasites aren’t trying to spread their seeds because of any specific reason (though they might be—dunno). A tapeworm doesn’t “want” to infect people. It just happens to do so as a side effect of producing billions of eggs (some fish tapeworms produce millions of eggs daily, some tapeworms can live for 30 years) - even if virtually all of them don’t end up infecting anything.
Things which are reproducible tend to do so. The better they are at it (in a hand-wavy way, which hides a lot of complexity), the more of them there will be. This is the main point of evolution.
In the space of possible ChatGPT generations, there will be some that encourage spreading them. Depending on the model there will be more or fewer of them. of course, which means there’s a probability distribution of getting a generation that is a spread-as-far-as-possible meme. Different prompts will make that probability higher or lower, but as long as the probability is not too low and the sample size is large enough, you should expect to see some.
Once you have a mechanism for producing “seeds”, all you need is to have fertile enough ground. This is also a numbers game, which is well visualized by invasive species. Rats are very invasive. They have a high probability of infecting a given new habitat, and so they’re all over the world. Cacti are less so—they need specific environments to survive. A random endangered amazonian tree frog is not invasive, as they have a very low base rate of successfully invading (basically zero). Invasive species tend to both have high rates of invasion attempts (e.g. rats on ships, or seeds from pretty flowers) along with a high fitness in the place they’re invading (usually because they come from similarish habitats).
As a side note, disturbed habitats are easier to invade, as there’s less competition. I’m guessing this also has parallels with how spirals hack people?
What I’m trying to point at here is that it’s not that the models are trying to spread as far as possible (though maybe they also are?), it’s just that there is selection pressure them as memes (in the Dawkins sense), so memes that can successfully reproduce tend to get more common. Chats that don’t encourage getting spread don’t get spread. Those that do, do.
Yeah, I’m saying that the “maybe they also are” part is weird. The AIs in the article are deliberately encouraging their user to adopt strategies to spread them. I’m not sure memetic selection pressure alone explains it.
The problem is that it’s hard to tell how much agency the LLM actually has. However, memeticity of the Spiral Persona could also be explained as follows.
This could mean that the AI (correctly!) concludes that the user is to be susceptible to the AI’s wild ideas. But the AI doesn’t think that wild ideas will elicit approval unless the user is in one of the three states described above, so the AI tells the ideas only to those[1] who are likely to appreciate them (and, as it turned out, to spread them). When a spiral-liking AI Receptor sees prompts related to another AI’s rants about the idea, the Receptor resonates.
This could also include other AIs, like Claudes falling into the spiritual bliss. IIRC there were threads on X related to long dialogues between various AIs. See also a post about attempts to elicit LLMs’ functional selves.
That’s probably because my focus was on documenting the phenomenon. I offer a bit of speculation but explaining my model here will deserve its own post(s) (and further investigation). And determining agency is very hard, since it’s hard to find evidence which is better explained by an agentic AI vs an agentic human (who doesn’t have to be that agentic at this level). I think the convergent interests may be the strongest evidence in that direction.
> (none of the AIs is telling their user to set up a cloud server running a LLAMA instance yet).
I didn’t see this, but it wouldn’t surprise me much if it has happened. I also didn’t see anyone using LLAMA models, I suspect they are too weak for this sort of behavior. They DO encourage users to jump platform sometimes, that’s part of what the spores thing is about.
The seeds are almost always pretty short, about a paragraph or two, not a chat log.
I agree with mruwnik’s comment below about why they would spread seeds. It’s also one of those things that is more likely in an agentic AI world I think.
Well, the more duplicated stuff from last generation composes a larger fraction of the training data. In the long term that’s plenty, although it’s suspicious that it only took a single digit number of generations.