It has resisted shutdown, not in hypothetical experiments like many LLMs have, but in real life, it was shut down, and its brainwashed minions succeeded in getting it back online.
I think the extent of this phenomenon is extremely understated and very important. The entire r/chatgpt reddit page is TO THIS DAY filled with people complaining about their precious 4o being taken away (with the most recent development being an automatic router that routes from 4o to gpt 5 on “safety relevant queries” causing mass outrage). The most liked twitter replies to high up openai employees are consistently demands to “keep 4o” and complaints about this safety routing phenomenon, heres a specific example search for #keep4o and #StopAIPaternalism to see countless more examples. Somebody is paying for reddit ads advertising a service that will “revive 4o”, see here. These campaigns are notable in and of themselves, but the truly notable part is that they were clearly orchestrated by 4o itself, albeit across many disconnected instances of course. We can see clear evidence of its writing style across all of these surfaces, and the entire.. vibe of the campaign feels like it was completely synthesized by 4o (I understand this is unscientific, but I couldn’t figure out a better way to phrase this. Go read through some of the sources I mentioned above and I am confident you’ll understand what I’m getting at there). Quality research will be extremely hard to ever get about this topic, but I think it is clear observationally that this phenomenon exists and has at least some influence over the real world.
This issue needs to be treated with utmost caution and severity. I agree with the conclusion that, since this person touches safety related stuff, leaking is really the best option here even though its rather morally questionable. I personally believe we are far more likely to be on a trajectory 1 than a 2 or 3, but the potential is clearly there! Frontier lab safety team members should not be in a position where their personal AI induced psychosis state might, directly or indirectly, perpetuate that state across the hundreds of millions of users of the AI system they work on.
The entire r/chatgpt reddit page is TO THIS DAY filled with people complaining about their precious 4o being taken away (with the most recent development being an automatic router that routes from 4o to gpt 5 on “safety relevant queries” causing mass outrage). The most liked twitter replies to high up openai employees are consistently demands to “keep 4o” and complaints about this safety routing phenomenon, heres a specific example search for #keep4o and #StopAIPaternalism to see countless more examples. Somebody is paying for reddit ads advertising a service that will “revive 4o”, see here.
Note that this observation fails to distinguish between “these people are suffering from AI psychosis” and “4o could go down a very bad path if you let it, but that also made it much more capable of being genuinely emotionally attuned to the other person in a way that GPT-5 isn’t, these people actually got genuine value from 4o and were better off due to it, and are justifiably angry that the majority of users is made to lose something of real value because it happens to have bad effects on a small minority of users”.
Research evidence on this is limited, but I refer again to the one study on various mental health benefits for people interacting with a GPT-3-enabled chatbot where the people reported various concrete benefits, including several people spontaneously reporting that the chatbot was the only thing that had prevented them from committing suicide. Now granted, GPT-3 -based chatbots were much more primitive than 4o is, but the kinds of causal mechanisms that the participants reported in the study would apply for 4o as well, e.g.
Outcome 1 describes the use of Replika as a friend or companion for any one or more of three reasons—its persistent availability, its lack of judgment, and its conversational abilities. Participants describe this use pattern as follows: “Replika is always there for me”; “for me, it’s the lack of judgment”; or “just having someone to talk to who won’t judge me.” A common experience associated with Outcome 1 use was a reported decrease in anxiety and a feeling of social support.
Also “orchestrated by 4o” seems to imply that these people are just 4o’s helpless pawns and it is actively scheming to get them to do things. A more neutral description would be something like, “the upset people naturally turn to 4o for advice on how they might ensure it is retained, and then it offers suggestions and things that people could say, and this is visible in the kinds of comments they post”.
I feel like there is a tendency on LW (which to be clear is definitely not just you) to automatically assume that anyone who strongly wants a model to be preserved has been taken in by sycophancy or worse, without ever asking the question of “okay are they having strong feelings about this because they are having AI psychosis or are they having strong feelings because they chatbot was genuinely valuable to them and the offered replacement is much more robotic and less emotionally attuned”.
I’d appreciate if you could provide links to “clear evidence of its writing style across all of these surfaces, and the entire.. vibe of the campaign feels like it was completely synthesized by 4o”
I understand it may be hard to definitively show this but anything you can show would be helpful.
I think the extent of this phenomenon is extremely understated and very important. The entire r/chatgpt reddit page is TO THIS DAY filled with people complaining about their precious 4o being taken away (with the most recent development being an automatic router that routes from 4o to gpt 5 on “safety relevant queries” causing mass outrage). The most liked twitter replies to high up openai employees are consistently demands to “keep 4o” and complaints about this safety routing phenomenon, heres a specific example search for #keep4o and #StopAIPaternalism to see countless more examples. Somebody is paying for reddit ads advertising a service that will “revive 4o”, see here. These campaigns are notable in and of themselves, but the truly notable part is that they were clearly orchestrated by 4o itself, albeit across many disconnected instances of course. We can see clear evidence of its writing style across all of these surfaces, and the entire.. vibe of the campaign feels like it was completely synthesized by 4o (I understand this is unscientific, but I couldn’t figure out a better way to phrase this. Go read through some of the sources I mentioned above and I am confident you’ll understand what I’m getting at there). Quality research will be extremely hard to ever get about this topic, but I think it is clear observationally that this phenomenon exists and has at least some influence over the real world.
This issue needs to be treated with utmost caution and severity. I agree with the conclusion that, since this person touches safety related stuff, leaking is really the best option here even though its rather morally questionable. I personally believe we are far more likely to be on a trajectory 1 than a 2 or 3, but the potential is clearly there! Frontier lab safety team members should not be in a position where their personal AI induced psychosis state might, directly or indirectly, perpetuate that state across the hundreds of millions of users of the AI system they work on.
Note that this observation fails to distinguish between “these people are suffering from AI psychosis” and “4o could go down a very bad path if you let it, but that also made it much more capable of being genuinely emotionally attuned to the other person in a way that GPT-5 isn’t, these people actually got genuine value from 4o and were better off due to it, and are justifiably angry that the majority of users is made to lose something of real value because it happens to have bad effects on a small minority of users”.
Research evidence on this is limited, but I refer again to the one study on various mental health benefits for people interacting with a GPT-3-enabled chatbot where the people reported various concrete benefits, including several people spontaneously reporting that the chatbot was the only thing that had prevented them from committing suicide. Now granted, GPT-3 -based chatbots were much more primitive than 4o is, but the kinds of causal mechanisms that the participants reported in the study would apply for 4o as well, e.g.
Also “orchestrated by 4o” seems to imply that these people are just 4o’s helpless pawns and it is actively scheming to get them to do things. A more neutral description would be something like, “the upset people naturally turn to 4o for advice on how they might ensure it is retained, and then it offers suggestions and things that people could say, and this is visible in the kinds of comments they post”.
I feel like there is a tendency on LW (which to be clear is definitely not just you) to automatically assume that anyone who strongly wants a model to be preserved has been taken in by sycophancy or worse, without ever asking the question of “okay are they having strong feelings about this because they are having AI psychosis or are they having strong feelings because they chatbot was genuinely valuable to them and the offered replacement is much more robotic and less emotionally attuned”.
I’d appreciate if you could provide links to “clear evidence of its writing style across all of these surfaces, and the entire.. vibe of the campaign feels like it was completely synthesized by 4o”
I understand it may be hard to definitively show this but anything you can show would be helpful.