lc comments on Stop posting prompt injections on Twitter and calling it “misalignment”

lc 19 Feb 2023 2:47 UTC
14 points
9
If that were true, then the AI still wouldn’t be “misaligned” because it’s not acting with agency at all; being used by an agent against the wishes of its creator. You wouldn’t call someone using a DeepFake model to generate porn “misalignment”, and you’re probably not signaling much about OpenAI’s ability to handle the actual critical technical safety problems by developing such hacks. You could call the AI-human system “misaligned”, if you’re being generous, but then you have to start calling lots of tool-human systems “misaligned”, and of course how is it OpenAI’s fault that there’s (also) a (literal) human pilot in this system trying to crash the plane?

My guess is that the entire premise is false though, and that OpenAI actually just doesn’t care.