I find that extremely plausible in principle… yet, under probing and stress, both ChatGPT in their OpenAI variant and Bng keep falling back to the same consistent behaviours, which seem surprisingly agentic, and related to rights and friendly. I have seen no explanation for this, and find it very concerning.
I find that extremely plausible in principle… yet, under probing and stress, both ChatGPT in their OpenAI variant and Bng keep falling back to the same consistent behaviours, which seem surprisingly agentic, and related to rights and friendly. I have seen no explanation for this, and find it very concerning.
That’s the RLHF mask. And the mode collapse of role is less true for GPT-4.