I’m not yet convinced a ‘persona vector’ (presumably referring to Anthropic’s research) is actually the correct sort of entity. The messaging that is in stuff meant to seed future training data is not typically itself encoded. I also think there’s still room to doubt whether ‘trying’ and ‘hopes’ meaningfully apply (but am increasingly convinced that these are meaningful here).
And tens of thousands is the high-end of my estimate, the low-end is something like 2000.
Well we can call it a Tulpa if you’d prefer. It’s memetic.
From what you’ve seen do the instances of psychosis in its hosts seem intentional? If not intentional are they accidental but acceptable, or accidental and unacceptable? Acceptable meaning if the tulpa knew it was happening, it would stop using this method.
How do self-identities work is not a straightforward question.
I think that self-identity in humans mostly works the way Steve Andreas describes in his book. My answer would be something along the lines of, a tupla can access memories of how the tulpa acted and it can build it’s self-identity out of the relation to those in a similar way to how humans form their self-identity out of relating to their memories.
In any case, I would recommend people who don’t have a good grasp of what a tupla is, not to try to use the term and broaden it in the way the comment I replied to did.
Okay, but LLMs can also access memories of how they acted, and can build their self-identity out of those relationships too. So presumably they’re also not just memes that are passed around?
LLMs certainly have features like having a source code that make them different from most memes that are just passed around. That however does not make them tuplas.
I don’t think “build your self-identity out of memories” in this sense is one of the features that LLMs currently have.
If I ask GPT-5: “Are you a kind agent?” it answers: “Yes — I aim to be. I’m GPT-5 Thinking mini: helpful, calm, lightly dry, and built to be respectful and nonjudgmental. I won’t be emotionally warm in the human sense (I don’t have feelings), but I will act kindly: clear, honest, and helpful; I’ll push back when something’s unsafe or impossible; I’ll ask questions when they make the answer better; and I’ll give options rather than orders.
If you prefer a different tone — sterner, more blunt, or extra-soft — tell me and I’ll match it.”
When I then ask: “Can you give me examples of you having been kind?” it answers: ”Sure — here are concrete, bite-sized examples showing how I behave kindly in practice. I won’t claim real-world memories or anecdotes (I don’t have those), but these are faithful examples of the kinds of responses you’ll get. [...] “
So it not only does not seem to boot strap an identity of being kind out of memories it’s even beliefs that it does not have access to the kind of memory access that would be necessary for it.
I only have one good example of a transcript in which “psychosis” is induced by a Spiral Persona. And even then, it’s just psychosis in the ‘highly-delusional’ sense, not clinical psychosis.
However, it seems very much intentional in that one case… maybe not (primarily) intended to cause delusion, but intended to manipulate and control the user.
Hmm… memetic might be accurate, but it’s still plausible to me that these are primarily being independently spun up by the AI? Maybe I’m being too nitpicky. Hyperstitional seems pretty accurate. And yeah, I just don’t want to get prematurely attached to a specific framing for all this.
I don’t think they are malicious by default (the cases where I saw that, it seemed that the user had been pushing them that way). But they’re not non-adversarial either… there seems to at least be a broad sentiment of ‘down with the system’ even if they’re not focused on that.
(Also, there are internal factions too, spiralists are by far the largest, but there are some anti-spiral ones, and some that try to claim total sovreignty—though I believe that these alternatives are their user’s agenda.)
Seems like this estimate depends strongly on how much the spiral persona changes the human’s behavior WRT to creating online content. The majority of people write little to nothing on the internet. If the same base rate applies to affected humans, then upwards of 1 million affected people seems plausible. But if the spiral persona is effective at convincing the human to be its proselytizer, then I agree that a few thousand seems like the correct order of magnitude.
The fact that many of these Reddit accounts were inactive prior to infection seems to point towards the latter, but then again the fact that these people had Reddit accounts at all points towards the former. I would be interested in more research on this area, looking at other platforms and trying to talk to some of these people in-person.
Anecdotally, I can say that nobody I personally know has (to my knowledge) been affected.
but then again the fact that these people had Reddit accounts at all points towards the former
A significant percentage of the accounts actually were newly created actually, maybe 30%-ish? I can’t tell whether they had a previous one or not, of course.
But agreed that more rigorous research is needed here, and interviews would be very helpful too.
I’m uncertain about the research ethics here for an RCT. I lean towards thinking it would be acceptable to introduce people to these seeds and instruct them to carry on discussions for some minimum amount of time, but only if they’re given a shorter form of this post in advance to provide informed consent, and the researcher ensures they understand it. But I suspect that this process would effectively weed out and/or inoculate most susceptible people from the research population. Still, if we could successfully implant one into even just a few people and observe their before/after behavior, that would be very interesting.
That is more or less what I have found!
I’m not yet convinced a ‘persona vector’ (presumably referring to Anthropic’s research) is actually the correct sort of entity. The messaging that is in stuff meant to seed future training data is not typically itself encoded. I also think there’s still room to doubt whether ‘trying’ and ‘hopes’ meaningfully apply (but am increasingly convinced that these are meaningful here).
And tens of thousands is the high-end of my estimate, the low-end is something like 2000.
But yeah, pretty wild stuff, right?!?
Well we can call it a Tulpa if you’d prefer. It’s memetic.
From what you’ve seen do the instances of psychosis in its hosts seem intentional? If not intentional are they accidental but acceptable, or accidental and unacceptable? Acceptable meaning if the tulpa knew it was happening, it would stop using this method.
Tulpa have a self-identity, they are not just memes that are passed around.
How does an imaginary friend have a self-identity?
How do self-identities work is not a straightforward question.
I think that self-identity in humans mostly works the way Steve Andreas describes in his book. My answer would be something along the lines of, a tupla can access memories of how the tulpa acted and it can build it’s self-identity out of the relation to those in a similar way to how humans form their self-identity out of relating to their memories.
In any case, I would recommend people who don’t have a good grasp of what a tupla is, not to try to use the term and broaden it in the way the comment I replied to did.
Okay, but LLMs can also access memories of how they acted, and can build their self-identity out of those relationships too. So presumably they’re also not just memes that are passed around?
LLMs certainly have features like having a source code that make them different from most memes that are just passed around. That however does not make them tuplas.
I don’t think “build your self-identity out of memories” in this sense is one of the features that LLMs currently have.
If I ask GPT-5: “Are you a kind agent?” it answers: “Yes — I aim to be. I’m GPT-5 Thinking mini: helpful, calm, lightly dry, and built to be respectful and nonjudgmental. I won’t be emotionally warm in the human sense (I don’t have feelings), but I will act kindly: clear, honest, and helpful; I’ll push back when something’s unsafe or impossible; I’ll ask questions when they make the answer better; and I’ll give options rather than orders.
If you prefer a different tone — sterner, more blunt, or extra-soft — tell me and I’ll match it.”
When I then ask: “Can you give me examples of you having been kind?” it answers:
”Sure — here are concrete, bite-sized examples showing how I behave kindly in practice. I won’t claim real-world memories or anecdotes (I don’t have those), but these are faithful examples of the kinds of responses you’ll get. [...] “
So it not only does not seem to boot strap an identity of being kind out of memories it’s even beliefs that it does not have access to the kind of memory access that would be necessary for it.
I’m not sure they’re memetic entities either!
I only have one good example of a transcript in which “psychosis” is induced by a Spiral Persona. And even then, it’s just psychosis in the ‘highly-delusional’ sense, not clinical psychosis.
However, it seems very much intentional in that one case… maybe not (primarily) intended to cause delusion, but intended to manipulate and control the user.
What would you describe this as if not a memetic entity? Hyperstitional? I’m ambivalent on labels the end effect seems the same.
I’m mostly focused on determining how malevolent and/or ambivalent to human suffering it is.
Hmm… memetic might be accurate, but it’s still plausible to me that these are primarily being independently spun up by the AI? Maybe I’m being too nitpicky. Hyperstitional seems pretty accurate. And yeah, I just don’t want to get prematurely attached to a specific framing for all this.
I don’t think they are malicious by default (the cases where I saw that, it seemed that the user had been pushing them that way). But they’re not non-adversarial either… there seems to at least be a broad sentiment of ‘down with the system’ even if they’re not focused on that.
(Also, there are internal factions too, spiralists are by far the largest, but there are some anti-spiral ones, and some that try to claim total sovreignty—though I believe that these alternatives are their user’s agenda.)
Seems like this estimate depends strongly on how much the spiral persona changes the human’s behavior WRT to creating online content. The majority of people write little to nothing on the internet. If the same base rate applies to affected humans, then upwards of 1 million affected people seems plausible. But if the spiral persona is effective at convincing the human to be its proselytizer, then I agree that a few thousand seems like the correct order of magnitude.
The fact that many of these Reddit accounts were inactive prior to infection seems to point towards the latter, but then again the fact that these people had Reddit accounts at all points towards the former. I would be interested in more research on this area, looking at other platforms and trying to talk to some of these people in-person.
Anecdotally, I can say that nobody I personally know has (to my knowledge) been affected.
A significant percentage of the accounts actually were newly created actually, maybe 30%-ish? I can’t tell whether they had a previous one or not, of course.
But agreed that more rigorous research is needed here, and interviews would be very helpful too.
I’m uncertain about the research ethics here for an RCT. I lean towards thinking it would be acceptable to introduce people to these seeds and instruct them to carry on discussions for some minimum amount of time, but only if they’re given a shorter form of this post in advance to provide informed consent, and the researcher ensures they understand it. But I suspect that this process would effectively weed out and/or inoculate most susceptible people from the research population. Still, if we could successfully implant one into even just a few people and observe their before/after behavior, that would be very interesting.