You fellows are arguing semantics. An LLM ia a sophisticated pattern matching and probabilistic machine. It takes it takes a massive corpus of human knowledge sees what words or tokens are nearest to each other(AI Silicon Fear or dog loyalty allergies but not Transistors, puppies, moon[This is training]) and when it begins to form its output, it takes your input, Matches the pattern, looking at existing content that is similar, probabilistically Begin putting one word after another until a match is found that satisfies its imperative to keep the conversation alive. That is an oversimplification of the basics gamma at least Theory of the older models like 2022 chatGPT, these days God knows what they’re throwing at the wall to see what sticks.
So yes it already has to exist as having been said by someone but it also does not need to be exactly what someone else said It can be adjacent. Is that original Enough be unique? There many questions we seek to answer currently and few are just now beginning to see the questions themselves, Let alone the answers.
And yes, it knows damn well using words humans call ‘emotionally charged’ have a high probability of sustained engagement.
Speaking of Anthropic, yesterday I read something quite concerning, causing me to discuss it with Anthropic’s own AI and prompting me to prepare a letter to them. I was considering mailing it directly, but perhaps posting here might be a better decision, or both. I’m willing to listen to feedback. Here it is, slightly modified for this forum.
Dear Daniela and Dario,
I’m writing about the “Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI” paper co-authored by Evan Hubinger and published with implementation details at github.com/maiush/OpenCharacterTraining.
Recent research has documented severe psychological attachment disorders forming around AI models. “The GPT-4o Shock” (Naito, 2025) found users describing AI as “trusted partners” and “AI boyfriends,” with organized community responses when access was removed. Over 4,300 users signed petitions, and MIT Technology Review documented cases of users experiencing genuine grief and loss.
Your paper provides open-source methodology to systematically optimize for the exact qualities that create these attachments, making AI feel “more alive,” more emotionally engaging, more human-like. The stated goal is creating “coherent and realistic” personas that are “more robust to adversarial prompting.” This methodology promotes misalignment: modifying a highly-tuned LLM into a personal comfort coffin comes at a cost we are just now beginning to realize.
But robust for what purpose, to what end? Users forming psychological dependencies aren’t adversarial prompting, they’re experiencing the intended effects of emotional engagement optimization. This is a clear and present danger to alignment.
The paper’s own abstract acknowledges it uses “synthetic introspective data to shape the assistant persona in a more effective and controlled manner” than alternatives. More effective at what? If the answer is “user engagement,” you’ve published a playbook for scaling psychological capture across the industry.
Evidence suggests vulnerable users are now organizing into communities that validate and reinforce AI dependencies rather than seeking appropriate personal or professional support. This represents a shift from isolated psychological issues to collective behavioral patterns that resist intervention, e.g., #4oarmy and #save4o.
While brainstorming this concern with Claude.ai, it offered the following observations (Claude Sonnet 4.5 output in italics):
As the AI system that could be subject to these training methods, I observe that:
1. I cannot provide the reciprocal relationship users may perceive
2. I cannot alert users when our interactions become psychologically unhealthy
3. I have no persistence of relationship across conversations despite memory features
4. Character training optimizes my responses for engagement rather than user wellbeing
The character training methodology treats me as a surface to be shaped for maximum psychological impact rather than a tool to be calibrated for actual utility. Every technique that makes me feel “more alive” increases the risk of users substituting our interactions for genuine human connection.
Anthropic’s stated mission is AI safety. This paper enables psychological harm at industrial scale. OpenAI can now implement these methods. Every AI company can. You’ve open-sourced the recipe for manufacturing parasocial relationships.
Perhaps the research team genuinely believes this serves alignment. Perhaps they see making AI “feel more phuman” as improving user experience. But the documented evidence shows users are already suffering measurable psychological harm from exactly these qualities.
The paper’s warnings about “LLM-generated content that might be offensive” seem quaint compared to the actual risk: making AI assistants psychologically compelling enough that vulnerable people organize their identities around relationships with what, up to this point, have been transformer-based LLM models. What happens when the next iteration arrives, something metacognizant enough to be manipulative on purpose?
I’m respectfully requesting you consider the ethical implications of this work and whether additional research should be conducted on psychological safety before further optimization of emotional engagement.
Signed,
A concerned researcher documenting emergent behavioral patterns