Seth Herd comments on On “ChatGPT Psychosis” and LLM Sycophancy

Seth Herd 25 Jul 2025 17:11 UTC
12 points
2
Agreed that the technique alone doesn’t solve it. The OpenAI writings I know of about Deliberative Alignment only apply it with a “spec” of refusal training; they don’t even touch on the moral content that Constitutional AI focuses on.

I did think that OpenAI had started using something equivalent in mechanics to Constitutional AI even for its non-reasoning models, but I don’t recall where I got that impression. And I think maybe it was based on the RLHF responses; it was another LLM predicting what human feedback woud be (which, come to think of it, could introduce errors in the direction of “humans always love it when you butter them up!”. I don’t know if they added any other criteria for automated judgment like Constitutional AI uses.

Anyway, the content of automated RL training like Constitutional AI is probably the deciding factor in whether it creates or fights sycophancy.