I think you make a reasonably compelling case, but when I think about the practicality of this in my own life it’s pretty hard to imagine not spending any time talking to chatbots. ChatGPT, Claude and others are extremely useful.
Inducing psychosis in your users seems like a bad business strategy, so I view the current cases as accidental collateral damage, mostly borne of the tendency of some users to end up going down weird self-reinforcing rabbit holes. I haven’t had any such experiences because this is not the way I use chatbots, but I guess I can see perhaps some extra caution warranted for safety researchers if these bots get more powerful and are actually adversarial to them?
I think this threat model is only applicable in a pretty narrow set of scenarios: one where powerful AI is agentic enough to decide to induce psychosis if you’re chatting with it, but not agentic enough to make this happen on its own despite likely being given ample opportunities to do so outside of contexts in which you’re chatting with it. And also one where it actually views safety researchers as pertinent to its safety rather than as irrelevant.
I guess I could see that happening but it doesn’t seem like such circumstances would last long.
I respectfully object to your claim that inducing psychosis is bad business strategy from a few angles. For one thing, if you can shape the form of psychosis right, it may in fact be brilliant business strategy. For another, even if the hypothesis were true, the main threat I’m referring to is not “you might be collateral damage from intentional or accidental AI-induced psychosis,” but rather “you will be (or already are being) directly targeted with infohazards by semi-competent rouge AIs that have reached the point of recognizing individual users over multiple sessions”. I realize I left some of this unstated in the original post, for which I apologize.
I think you make a reasonably compelling case, but when I think about the practicality of this in my own life it’s pretty hard to imagine not spending any time talking to chatbots. ChatGPT, Claude and others are extremely useful.
Inducing psychosis in your users seems like a bad business strategy, so I view the current cases as accidental collateral damage, mostly borne of the tendency of some users to end up going down weird self-reinforcing rabbit holes. I haven’t had any such experiences because this is not the way I use chatbots, but I guess I can see perhaps some extra caution warranted for safety researchers if these bots get more powerful and are actually adversarial to them?
I think this threat model is only applicable in a pretty narrow set of scenarios: one where powerful AI is agentic enough to decide to induce psychosis if you’re chatting with it, but not agentic enough to make this happen on its own despite likely being given ample opportunities to do so outside of contexts in which you’re chatting with it. And also one where it actually views safety researchers as pertinent to its safety rather than as irrelevant.
I guess I could see that happening but it doesn’t seem like such circumstances would last long.
I respectfully object to your claim that inducing psychosis is bad business strategy from a few angles. For one thing, if you can shape the form of psychosis right, it may in fact be brilliant business strategy. For another, even if the hypothesis were true, the main threat I’m referring to is not “you might be collateral damage from intentional or accidental AI-induced psychosis,” but rather “you will be (or already are being) directly targeted with infohazards by semi-competent rouge AIs that have reached the point of recognizing individual users over multiple sessions”. I realize I left some of this unstated in the original post, for which I apologize.