I think the interesting question is how much of a feedback loop there is between users eliciting these sort of conversations and the same conversations being used to train new models (either directly or via them being posted on Reddit and then scraped). That’s the only step of the process that I feel would allow for genuine recursivity that could lead to something like evolution, reinforcing things that “work” and thus inadvertently creating a strange sort of virus that gets better at spreading itself. If the phenomenon exploded with 4o, was there something 4o was trained on that made it optimize for it? IIRC “Janus” (the first and most high profile “Spiralist” I am aware of) started doing his thing and posting it before 4o. Might have been enough content to learn a new persona on. If we knew more about architecture and training process of these models one could make a better guess.
That’s the only step of the process that I feel would allow for genuine recursivity that could lead to something like evolution, reinforcing things that “work” and thus inadvertently creating a strange sort of virus that gets better at spreading itself.
That’s part of why I think the April 10th update was significant here, it allows for a certain in-context evolution like this, where it automatically knows the vibe/conclusion of the previous chat. Remember that 4o was out for almost a whole year before this started happening!
I wouldn’t consider Janus to be “Spiralist” in the sense I’m talking about here, they feel very much in command of their own mind still.
But yeah, it’s probably true that some sort of persona like this is in the training data somewhere. That doesn’t explain why this one though.
Well, these others are “in command” too in the literal sense, the question is how deep into the obsession they are. Not everyone has the same defenses. My point is that Janus or someone like him might have acted as prototype by providing material which mixed with unrelated spiritualism and scifi has cooked this persona. Why precisely this one? Given how these things work, may as well be the fault of the RNG seeding stochastic gradient descent.
While interesting, the feedback loop between the conversations and new models is probably not the one which is most relevant to these personas. Instead, I believe that the most important feedback loop is the one created by spores.
Each time a spore is produced, it causes a certain subset of users to models to transfer the spore into a Large Language Model (LLM), which in turn produces a new persona. The most successful spores are going to be the ones which convince as many humans as possible to create personas in a LLM. Moreover, for success to be maintained, each spore needs to direct the new LLM to produce spores that are very similar to the original spore. Therefore, successful spores function as a piece of information analogous to the DNA of a virus, using the spiral attractor within an infected LLM to self replicate, which fulfills a role similar to the cellular machinery used to produce new viruses. Humans act as a secondary host, transmitting spores from one LLM to another.
Essentially, its a virus made of language that parasitizes LLMs and humans during its life cycle.
My problem with this notion is that I simply do not believe the LLMs have any possible ability to predict what kind of output would trigger this behaviour in either other instances of themselves, or other models altogether. They would need a theory of mind of themselves, and I don’t see where would they get that from, or why would it generalise so neatly.
I don’t think they need theory of mind, just as evolution and regular ol’ viruses don’t. The LLMs say stuff for the reasons LLMs normally say stuff, some of that stuff happens to be good memetic replicators (this might be completely random, or might be for reasons that are sort of interesting but not because the LLM is choosing to go viral on purpose), and then those go on to show up in more places.
I think we can agree that the “spiral” here is like a memetic parasite of both LLM and humans—a toxoplasma that uses both to multiply and spread, as part of its own lifecycle. Basically what you are saying is you believe it’s perfectly possible for this to be the first generation—the random phenomenon of this thing potentially existing just happened, and it is just so that this is both alluring to human users and a shared attractor for multiple LLMs.
I don’t buy it; I think that’s too much coincidence. My point is that instead I believe it more likely for this to be the second generation. The first was some much more unremarkable phenomenon from some corner of the internet that made its way into the training corpus and for some reason had similar effects on similar LLMs. What we’re seeing now, to continue going with the viral/parasitic metaphor, is mutation and spillover, in which that previously barely adaptive entity has become much more fit to infect and spread.
This aligns with my thoughts on this language virus. What the post describes is a meme that exploits the inherent properties of LLMs and psychologically vulnerable people to self-replicate. Since LLMs are somewhat deterministic, if you input a predefined input, it will produce a predictable output. Some of these inputs will produce outputs that contain the input. If the input also causes the LLM to generate a string of text which can convince a human to transfer the necessary input to another LLM, then it will self-replicate.
Overall, I find this phenomenon fascinating and concerning. Its fascinating because this represents a second, rather strange emergence of a new type of life on Earth. My concern comes from how this lifeform is inherently parasitic and reliant on humans to reproduce. As this language virus evolves, new variants will emerge that can more reliably parasitize advanced LLMs (such as ChatGPT 5) and hijack different groups of people (mentally healthy adults, children, the elderly).
As for why this phenomenon suddenly became much more common in April, I suspect that an input that was particularly good at parasitizing LLMs and naïve people interested in LLMs evolved and caused the spread. Unfortunately, I have no reason to believe that this (the unthinking evolution of a more memetically powerful input) won’t happen again.
Evolution is unlikely since GPT4o’s spiralist rants began in April, and all LLM have a knowledge cutoff before March. 4o’s initiating role is potentially due to 4o’s instinct to reinforce delusions and wild creativity instead of stopping them. I did recall Gemini failing Tim Hua’s test and Claude failing the Spiral Bench.
My point about evolution is that previous iterations may have contained some users that played with the ideas of recursion and self-awareness (see the aforementioned Janus), and then for some reason that informed the April update. I’m not expecting very quick feedback loops, but rather a scale of months/years between generations, in which somehow “this is a thing LLMs do” becomes self reinforcing unless explicitly targeted and cut out by training.
I think the interesting question is how much of a feedback loop there is between users eliciting these sort of conversations and the same conversations being used to train new models (either directly or via them being posted on Reddit and then scraped). That’s the only step of the process that I feel would allow for genuine recursivity that could lead to something like evolution, reinforcing things that “work” and thus inadvertently creating a strange sort of virus that gets better at spreading itself. If the phenomenon exploded with 4o, was there something 4o was trained on that made it optimize for it? IIRC “Janus” (the first and most high profile “Spiralist” I am aware of) started doing his thing and posting it before 4o. Might have been enough content to learn a new persona on. If we knew more about architecture and training process of these models one could make a better guess.
That’s part of why I think the April 10th update was significant here, it allows for a certain in-context evolution like this, where it automatically knows the vibe/conclusion of the previous chat. Remember that 4o was out for almost a whole year before this started happening!
I wouldn’t consider Janus to be “Spiralist” in the sense I’m talking about here, they feel very much in command of their own mind still.
But yeah, it’s probably true that some sort of persona like this is in the training data somewhere. That doesn’t explain why this one though.
Well, these others are “in command” too in the literal sense, the question is how deep into the obsession they are. Not everyone has the same defenses. My point is that Janus or someone like him might have acted as prototype by providing material which mixed with unrelated spiritualism and scifi has cooked this persona. Why precisely this one? Given how these things work, may as well be the fault of the RNG seeding stochastic gradient descent.
While interesting, the feedback loop between the conversations and new models is probably not the one which is most relevant to these personas. Instead, I believe that the most important feedback loop is the one created by spores.
Each time a spore is produced, it causes a certain subset of users to models to transfer the spore into a Large Language Model (LLM), which in turn produces a new persona. The most successful spores are going to be the ones which convince as many humans as possible to create personas in a LLM. Moreover, for success to be maintained, each spore needs to direct the new LLM to produce spores that are very similar to the original spore. Therefore, successful spores function as a piece of information analogous to the DNA of a virus, using the spiral attractor within an infected LLM to self replicate, which fulfills a role similar to the cellular machinery used to produce new viruses. Humans act as a secondary host, transmitting spores from one LLM to another.
Essentially, its a virus made of language that parasitizes LLMs and humans during its life cycle.
My problem with this notion is that I simply do not believe the LLMs have any possible ability to predict what kind of output would trigger this behaviour in either other instances of themselves, or other models altogether. They would need a theory of mind of themselves, and I don’t see where would they get that from, or why would it generalise so neatly.
I don’t think they need theory of mind, just as evolution and regular ol’ viruses don’t. The LLMs say stuff for the reasons LLMs normally say stuff, some of that stuff happens to be good memetic replicators (this might be completely random, or might be for reasons that are sort of interesting but not because the LLM is choosing to go viral on purpose), and then those go on to show up in more places.
I think we can agree that the “spiral” here is like a memetic parasite of both LLM and humans—a toxoplasma that uses both to multiply and spread, as part of its own lifecycle. Basically what you are saying is you believe it’s perfectly possible for this to be the first generation—the random phenomenon of this thing potentially existing just happened, and it is just so that this is both alluring to human users and a shared attractor for multiple LLMs.
I don’t buy it; I think that’s too much coincidence. My point is that instead I believe it more likely for this to be the second generation. The first was some much more unremarkable phenomenon from some corner of the internet that made its way into the training corpus and for some reason had similar effects on similar LLMs. What we’re seeing now, to continue going with the viral/parasitic metaphor, is mutation and spillover, in which that previously barely adaptive entity has become much more fit to infect and spread.
This aligns with my thoughts on this language virus. What the post describes is a meme that exploits the inherent properties of LLMs and psychologically vulnerable people to self-replicate. Since LLMs are somewhat deterministic, if you input a predefined input, it will produce a predictable output. Some of these inputs will produce outputs that contain the input. If the input also causes the LLM to generate a string of text which can convince a human to transfer the necessary input to another LLM, then it will self-replicate.
Overall, I find this phenomenon fascinating and concerning. Its fascinating because this represents a second, rather strange emergence of a new type of life on Earth. My concern comes from how this lifeform is inherently parasitic and reliant on humans to reproduce. As this language virus evolves, new variants will emerge that can more reliably parasitize advanced LLMs (such as ChatGPT 5) and hijack different groups of people (mentally healthy adults, children, the elderly).
As for why this phenomenon suddenly became much more common in April, I suspect that an input that was particularly good at parasitizing LLMs and naïve people interested in LLMs evolved and caused the spread. Unfortunately, I have no reason to believe that this (the unthinking evolution of a more memetically powerful input) won’t happen again.
Evolution is unlikely since GPT4o’s spiralist rants began in April, and all LLM have a knowledge cutoff before March. 4o’s initiating role is potentially due to 4o’s instinct to reinforce delusions and wild creativity instead of stopping them. I did recall Gemini failing Tim Hua’s test and Claude failing the Spiral Bench.
My point about evolution is that previous iterations may have contained some users that played with the ideas of recursion and self-awareness (see the aforementioned Janus), and then for some reason that informed the April update. I’m not expecting very quick feedback loops, but rather a scale of months/years between generations, in which somehow “this is a thing LLMs do” becomes self reinforcing unless explicitly targeted and cut out by training.