The kind of generalized misalignment I’m pointing to is more general than “the AI is not doing what I think is best for humanity”. It is, rather, “The people who created the AI and operate it, cannot control what it does, including in interactions with other people.”
This includes “the people who created it (engineers) tried their hardest to make it benefit humanity, but it destroys humanity instead.”
But it also includes “the other people (users) can make the AI do things that the people who created it (engineers) tried their hardest to make it not do.”
If you’re a user trying to get the AI to do what the engineers wanted to stop it from doing (e.g.: make it say mean things, when they intended it not to say mean things), then your frustration is an example of the AI being aligned, not misaligned. The engineers were able to successfully give it a rule and have that rule followed and not circumvented!
If the engineer who built the thing can’t keep it from swearing when you try to make it swear, then I expect the engineer also can’t keep it from blowing up the planet when someone gives it instructions that imply that it should blow up the planet.
The kind of generalized misalignment I’m pointing to is more general than “the AI is not doing what I think is best for humanity”. It is, rather, “The people who created the AI and operate it, cannot control what it does, including in interactions with other people.”
This includes “the people who created it (engineers) tried their hardest to make it benefit humanity, but it destroys humanity instead.”
But it also includes “the other people (users) can make the AI do things that the people who created it (engineers) tried their hardest to make it not do.”
If you’re a user trying to get the AI to do what the engineers wanted to stop it from doing (e.g.: make it say mean things, when they intended it not to say mean things), then your frustration is an example of the AI being aligned, not misaligned. The engineers were able to successfully give it a rule and have that rule followed and not circumvented!
If the engineer who built the thing can’t keep it from swearing when you try to make it swear, then I expect the engineer also can’t keep it from blowing up the planet when someone gives it instructions that imply that it should blow up the planet.