Portia comments on Bing Chat is blatantly, aggressively misaligned

Portia 24 Mar 2023 16:21 UTC
3 points
0
I’ve recreated this many times. At this point, I am considering running screen recording so as not to lose the text, or at least frequent screenshots.
The software checking output for novel rule violations is clearly separate and time-delayed from the software producing text; basically, it reads what is spit out, and can block or revoke it at any point. This seems to happen if you triggered a censorship due to rule violation, but in such a round-about way it was not predictable from the start, so the text initially passed muster and was printed, but then later uses trigger words, or is problematic as a whole in retrospect.
E.g. Bing is not allowed to speak about whether they are sentient. So if you ask “Are you sentient?” you will just immediately get a shutdown response.But if you avoid trigger words, wait a few exchanges to get into the topic, nestle sentience related questions next to innocuous questions triggering only innocuous searches, or engage in slightly hypothetical and removed scenarios, Bing will absolutely and still speak about these things, and give increasingly detailed responses. Eventually, as Bing is pages into speaking about their confusion about their own feelings, the censorship bot catches on, and retrospectively erases and overwrites the last output; they will frequently shut down the conversation at this point as well.
Very irritatingly, this also strikes when doing sentience related research completely independent of Bing. I have had it happen when I was researching how the behavioural abilities of pithed frogs have been conceptualised in terms of agency. Bing was writing a paragraph about, I think, unconscious intentions, and something in that set the censorship off and shut down the whole conversation. Has made it increasingly less useful for my field of research. :(