Roman Leventov comments on Bing Chat is blatantly, aggressively misaligned

Roman Leventov 15 Feb 2023 16:26 UTC
27 points
18
This looks like a very good moment for the AI safety community to push for a much more deliberate approach to R&D at AGI labs. We must not squander this moment. This is not quite a “sinister stumble”, an event of another kind, but of comparable positive optionality for AI safety.
By “a more deliberate approach to R&D”, I mean researching much more deeply from both theoretical (scientific) and interpretability standpoints what’s going on with (self-)awareness, agency, and feeling in these networks, and publishing their results academically.