A concerning aspect of this that AI psychosis is a failure mode which occurs due to long-term interactions with the LLM. Therefore it may be expensive (and unethical) to sample lots of trajectories with users to feed into your post-training pipeline to prevent it. Also, users may not be in a good position to say whether they have AI psychosis. Is there any public research on how the labs are trying to solve this?
A concerning aspect of this that AI psychosis is a failure mode which occurs due to long-term interactions with the LLM. Therefore it may be expensive (and unethical) to sample lots of trajectories with users to feed into your post-training pipeline to prevent it. Also, users may not be in a good position to say whether they have AI psychosis. Is there any public research on how the labs are trying to solve this?