evhub comments on Bing Chat is blatantly, aggressively misaligned

evhub 16 Feb 2023 23:05 UTC
12 points
18

No. I’d expect the most serious misalignment from Microsoft’s perspective is a hallucination which someone believes, and which incurs material damage as a result, which Microsoft can then be sued over. Hostile language from the LLM is arguably a bad look in terms of PR, but not obviously particularly bad for the bottom line.

Obviously we can always play the game of inventing new possible failure modes that would be worse and worse. The point, though, is that the hostile/threatening failure mode is quite bad and new relative to previous models like ChatGPT.