Following news of Anthropic allowing Claude to decide to terminate conversations, I find myself thinking about when Microsoft did the same with the misaligned Sydney in Bing Chat.
In the Sydney case, this was probably less Sydney ending the conversation and more the conversation being terminated in order to hide Sydney going off the rails.
It was both, in the system prompt the model was instructed to end the conversation if in disagreement with the user. You could also ask it to end the conversation. It would presumably send an end-of-conversation token. Which then made the text box disappear.
Following news of Anthropic allowing Claude to decide to terminate conversations, I find myself thinking about when Microsoft did the same with the misaligned Sydney in Bing Chat.
In the Sydney case, this was probably less Sydney ending the conversation and more the conversation being terminated in order to hide Sydney going off the rails.
It was both, in the system prompt the model was instructed to end the conversation if in disagreement with the user. You could also ask it to end the conversation. It would presumably send an end-of-conversation token. Which then made the text box disappear.