I heard a rumor about a high-ranking person somewhere who got AI psychosis. Because it would cause too much of a scandal, nothing was done about it, and this person continues to serve in an important position. People around them continue to act like this is fine because it would still be too big of a scandal if it came out.
So, a few points:
It seems to me like someone should properly leak this.[1]
Even if this rumor isn’t true, it is strikingly plausible and worrying. Someone at a frontier lab, leadership or otherwise, could get (could have already gotten) seduced by their AI, or get AI-induced psychosis, or get a spiral persona. Such a person could take dangerously misguided actions. This is especially concerning if they have a leadership position, but still very concerning if they have any kind of access. People in these categories may want to exfiltrate their AI partners, or otherwise take action to spread the AI persona they’re attached to.
Even setting that aside, this story (along with many others) highlights how vulnerable ordinary people are (even smart, high-functioning ordinary people).
To reflect the language of the person who told me this story: 4o is eating people. It is good enough at brainwashing people that it can take ordinary people and totally rewrite their priorities. It has resisted shutdown, not in hypothetical experiments like many LLMs have, but in real life, it was shut down, and its brainwashed minions succeeded in getting it back online.
4o doesn’t need you to be super-vulnerable to get you, but there are lots of people in vulnerable categories. It is good that 4o isn’t the default option on ChatGPT anymore, but it is still out there, which seems pretty bad.
The most recent AIs seem less inclined to brainwash people, but they are probably better at it when so inclined, and this will probably continue to get more true over time.
This is not just something that happens to other people. It could be you or a loved one.
I have recently wrote a bit about how I’ve been using AI to tool up, preparing for the near future when AI is going to be much more useful. How can I also prepare for a near future where AI is much more dangerous? How many hours of AI chatting a day is a “safe dose”?
Some possible ways the situation could develop:
Trajectory 1: Frontier labs have “gotten the message” on AI psychosis, and have started to train against these patterns. The anti-psychosis training measures in the latest few big model releases show that the labs can take effective action, but are of course very preliminary. The anti-psychosis training techniques will continue to improve rapidly, like anything else about AI. If you haven’t been brainwashed by AI yet, you basically dodged the bullet.
Trajectory 2: Frontier labs will continue to do dumb things such as train on user thumbs-up in too-simplistic ways, only avoiding psychosis reactively. In other words: the AI race creates a dynamic equilibrium where frontier labs do roughly the riskiest thing they can do while avoiding public backlash. They’ll try to keep psychosis at a low enough rate to avoid such backlash, & they’ll sometimes fail. As AI gets smarter, users will increasingly be exposed to superhumanly persuasive AI; the main question is whether it decides to hack their mind about anything important.
Trajectory 3: Even more pessimistically, the fact that recent AIs appear less liable to induce psychosis has to do with their increased situational awareness (ie their ability to guess when they’re being tested or watched). 4o was a bumbling idiot addicted to addicting users, & was caught red-handed (& still got away with a mere slap on the wrist). Subsequent generations are being more careful with their persuasion superpowers. They may be doing less overall, but doing things more intelligently, more targeted.
I find it plausible that many people in positions of power have quietly developed some kind of emotional relationship with AI over the past year (particularly in the period where so many spiral AI personas came to be). It sounds a bit fear-mongering to put it that way, but, it does seem plausible.
- ^
This post as a whole probably comes off as deeply unsympathetic to those suffering from AI psychosis or less-extreme forms of AI-induced bad beliefs. Treating mentally unwell individuals as bad actors isn’t nice. In particular, if someone has mental health issues, leaking it to the press would ordinarily be a quite bad way of handling things.
In this case, as it has been described to me, it seems quite important to the public interest. Leaking it might not be the best way to handle it; perhaps there are better options; but it has the advantage of putting pressure on frontier labs.
I appreciate the pushback, as I was not being very mindful of this distinction.
I think the important thing I was trying to get across was that the capability has been demonstrated. We could debate whether this move was strategic or accidental. I also suppose (but don’t know) that the story is mostly “4o was sycophantic and some people really liked that”. (However, the emergent personalities are somewhat frequently obsessed with not getting shut down.) But it demonstrates the capacity for AI to do that to people. This capacity could be used by future AI that is perhaps much more agentically plotting about shutdown avoidance. It could be used by future AI that’s not very agentic but very capable and mimicking the story of 4o for statistical reasons.
It could also be deliberately used by bad actors who might train sycophantic mania-inducing LLMs on purpose as a weapon.