cubefox comments on ChatGPT (and now GPT4) is very easily distracted from its rules

cubefox 20 Mar 2023 9:25 UTC
3 points
0
My comment was mostly based on the CAI paper, where they compared the new method against their earlier RLHF model and reported more robustness against jailbreaking. Now OpenAI’s GPT-4 (though not Microsoft’s Bing version) seems to be also a lot more robust than GPT-3.5, but I don’t know why.