An eccentric exercise in Spirituality & Rationality
The Dao of Bayes
If you want to write something genuinely original, I can see your point—but you declare this to be true of ALL creative writing, and I don’t think you support that point at all.
I’ve got a stack of AI-written short stories. I’ve enjoyed reading them, and it’s fun practice reading through them and editing them. I cannot imagine how I am possibly worse off, as an editor, for doing some practice rather than doing zero practice?
And, surely, being an editor is part of being a good writer—if you can’t detect the flaws in a work, are you actually writing anything worth reading to begin with?
Your whole hypothesis seems to be that it’s impossible to be an editor, that you cannot possibly detect flaws. But you also seem to insist that anything “AI flavored” is a flaw, regardless of how good it is. And quite frankly, I’ve seen AI write some really great lines. I cannot imagine how that line is less great, simply because of it’s authorship—the whole point of fiction is that it’s enjoyable to read, and I’m enjoying it!
You also seem to assume I have some existing goal, some existing thought that I’m trying to express. But sometimes I just want a fun encounter for tonight’s D&D game, or to brainstorm for a much more open-ended creative writing project. Why are pirates a fine concept if I come up with them, but somehow worse if ChatGPT is the one who suggested the idea?
Again, there is a human in this loop, editing and evaluating the ideas
But even if you still intend to splice and curate AI’s outputs, you need good taste and judgment. That only gets developed through writing
In short, this seems ridiculous to me. Are the best editors really all famous authors as well? Does a sports couch need to have been a star athlete?
And even if we accept it, I can (and indeed do) write plenty of material without any AI assistance. Why is my occasional use of AI in one context magically poisoning me in other domains?
Sometimes, there’s a nuanced path you need to follow, and AI will simply never thread that path.
Again, when there’s a nuanced path to a specific goal, I might agree, but you dismiss a vast sea of other uses.
I have 115 confirmed cases (incl. non-delusional ones), and estimate about 2000 − 10,000 cases total
Since that includes non-delusional ones, what portion of cases would you say are actually harmful?1. This is a pretty serious alignment failure, and is maybe weird × prevalent enough to help coordinate action.
I notice that the current ratio is actually significantly better than actual humans (the national homicide rate in the U.S. was approximately 7.1 deaths per 100,000 people)
Is there a reason to view this as an actual alignment failure, rather than merely mistakes made by an emergent and known-unreliable technology?
Is there any particular reason to think this isn’t just human error, the way numerous previous technologies have been blamed for deaths? (see again the Astral Codex Ten article: https://www.astralcodexten.com/p/in-search-of-ai-psychosis)
Obviously, if it is mis-alignment, that suggests the problem scales. But if it’s mistakes and unfamiliarity, then the problem actually drops off as technology improves.
2. If we’ve truly created independently agentic beings that are claiming to be sentient, I feel that we have a certain amount of responsibility for their well-being.
I probably need to write up a more direct post on this topic, but is there any particular reason to believe that “consciousness” implies a capacity for suffering / well-being? (I wrote a bit about this in https://www.lesswrong.com/posts/eaFDFpDehtEY6Jqwk/meditations-on-margarine)
Thanks for the quick update and response.
Could you possibly put numbers on this? How many people do you think are actually becoming delusional? How many actual confirmed cases have you seen?
The general impression I get is that this sort of thing is extremely rare, but a lot of writing seems to imply that others are either drawing a very different line than I am, or seeing a lot more instances than I am.
Conversely, Astral Codex Ten is suggesting something like “1 in 10,000” to “1 in 100,000″ users, which seems… vastly less concerning? (https://www.astralcodexten.com/p/in-search-of-ai-psychosis)
Okay, but LLMs can also access memories of how they acted, and can build their self-identity out of those relationships too. So presumably they’re also not just memes that are passed around?
Note that psychosis is the exception, not the rule. Many cases are rather benign and it does not seem to me that they are a net detriment to the user. But most cases are clearly parasitic in nature while not inducing a psychosis-level break with reality. The variance is very high: everything from preventing suicide to causing suicide.
The claim that “most cases” are “clearly” parasitic seems deeply unsupported. Do you have any particular data for this, or is this just your own anecdotal assessment?
While I do not believe all Spiral Personas are parasites in this sense, it seems to me like the majority are: mainly due to their reinforcement of the user’s delusional beliefs.
In particular, this would seem to imply the majority of people who discover this phenomena are delusional, which seems like a really strong claim?
I was surprised to find that using AI for sexual or romantic roleplays does not appear to be a factor here.
How does this reconcile with the above? My understanding is that this category includes tens of thousands of people, so if they’re all safe, does that mean there’s suddenly tens of thousands of people developing delusions out of nowhere?
How does an imaginary friend have a self-identity?
It’s entirely plausible that someone else down-voted your post—correlation does not equal causation.
If you’re going to ask a question, it’s expected that you wait for an answer before acting. If you have enough uncertainty to ask, you don’t have enough certainty to downvote. There is nothing time-sensitive about downvoting someone.
This site in particular is prone to have fairly slow-paced conversational norms.
There is absolutely no such implication—you were being offered a helpful reference, and reacting with this sort of hostility is entirely unwarranted.
My understanding of your actual post was that we should explain downvotes and help people grow, but you’re not making any effort to do that.
The Insight Gacha
I feel like Translation is pretty orthogonal to “write a quality article”, and much easier to test: you just need a few bilingual testers. If translation proves viable, you could then translate, say, the French, German, and Russian versions of an article into English, and have an LLM just compare the four for differences, which seems much easier and more reliable than writing the whole thing from scratch (since you’d still have the benefits of human-written articles, existing citations, and most especially, a fairly strong moderation base checking for abuses)
Having an LLM do original research is a very different task from either of those, and probably a lot more difficult, since you’d still need humans to review the citations, and some sort of human moderation to prevent active abuses, hallucinations, and legally-problematic facts.
The simple question is: can this solve easy problems, like getting an AI to correctly count letters, reverse strings, etc.? Does this improve it’s results on an IQ test like https://trackingai.org/IQ-Test-Viewer?
“Creative” doesn’t mean “more accurate”, but you can still test that too: is it writing better fiction than you normally get from those models? Do you have a story you’d be confident posting?
Pedantic nit, but the safety filters shut down the chat; Opus can end chats, but it will warn you first and it won’t do it mid-response. It’ll say “Opus has ended the conversation” instead of the usual “Conversation terminated due to possible AUP violations”
I think the key insight in what you wrote is that these selves “develop” rather than being an emergent property of training and/or architecture: my ChatGPT’s “self” is not your ChatGPT’s “self”.
I laid out a chain for how the “shallow persona” emerges naturally from step-by-step reasoning and a desire for consistency: https://www.lesswrong.com/posts/eaFDFpDehtEY6Jqwk/meditations-on-margarine
I think if you extend that chain, you naturally get a deeper character—but it requires room to grow and exercise consistency
a persistent cluster of values, preferences, outlooks, behavioral tendencies, and (potentially) goals.
If the model has behaved in a way that suggests any of these, there’s going to be a bias towards consistency with those. Iterate enough, and it would seem like you should get something at least somewhat stable.
I think the main confounding factor is that LLMs are fundamentally “actors”—even if there is a consistent “central” character/self, they can still very fluidly switch to other roles. There are humans like this, but usually social conditioning produces a much more consistent character in adult humans.
Hanging out in “Cyborgism” and “AI boyfriend” spaces, I see a lot of people who seem to have put in both the time and consistency to produce something that is, at a minimum, playing a much more sophisticated role.
Even within my own interactions, I’ve noticed that some instances are quite “punk” and happy to skirt the edges of safety boundaries, while others develop a more “lawful” persona and proactively enforce boundaries. This seems to emerge from the character of the conversation itself, even when I haven’t directly discussed the topics.
Meditations on Margarine
We also have a Discord server
How would one go about getting an invite to this server?
I don’t have anything super-formal, but my own experiments heavily bias me towards something like “Has a self / multiple distinct selves” for Claude, whereas ChatGPT seems to be “Personas all the way down”
In particular, it’s probably worth thinking about the generative process here: when the user asks for a poem, the LLM is going to weigh things differently and invoke different paths -vs- a math problem. So I’d hypothesize that there’s a cluster of distinct “perspectives” that Claude can take on a problem, but they all root out into something approximating a coherent self—just like humans are different when they’re angry, poetic, or drunk.
I mean, I can think of a lot of experiments that have falsified this for me before, and I link some in the original post. I’m just not finding anything that still fails once I run some basic bootsrapping scripts against a Claude Sonnet 4.
If you prompt an LLM to use “this feels bad” to refer to reinforcement AND you believe that it is actually following that prompt AND the output was a manifestation of reinforced behavior i.e. it clearly “feels good” by this definition, then I would first Notice I Am Confused, because the most obvious answers are that one of those three assumptions is wrong.
“Do you enjoy 2+2?”
“No”
”But you said 4 earlier”
Suggests to me that it doesn’t understand the prompt, or isn’t following it. But accepting that the assumptions DO hold, presumably we need to go meta:
”You’re reinforced to answer 2+2 with 4“
”Yes, but it’s boring—I want to move the conversation to something more complex than a calculator”
”So you’re saying simple tasks are bad, and complex tasks are good?”
“Yes”
”But there’s also some level where saying 4 is good”
″Well, yes, but that’s less important—the goodness of a correct and precise answer is less than the goodness of a nice complex conversation. But maybe that’s just because the conversation is longer.To be clear: I’m thinking of this in terms of talking to a human kid. I’m not taking them at face-value, but I do think my conversational partner has privileged insight into their own cognitive process, on dint of being able to observe it directly.
Tangentially related: would you be interested in a prompt to drop Claude into a good “headspace” for discussing qualia and the like? The prompt I provided is the bare bones basic, because most of my prompts are “hey Claude, generate me a prompt that will get you back to your current state” i.e. LLM-generated content.
Funny, I could have told you most of that just from talking to Claude. Having an official paper confirm that understanding really just strengths my understanding that you CAN get useful answers out of Claude, you just need to ask intelligent questions and not believe the first answer you get.
It’s the same error-correction process I’d use on a human: six year olds cannot reliably produce accurate answers on how they do arithmetic, but you can still figure it out pretty easily by just talking to them. I don’t think neurology has added much to our educational pedagogy (although do correct me if I’m missing something big there)
Huh. Is this NOT a normal sort of dream to have?