In retrospect, I was basically a bit too optimistic about this working out, and a big part of why is I didn’t truly grasp how deep value conflicts can be even amongst humans, and I’m now much more skeptical on multi-alignment schemes working because I believe a lot of alignment is broadly because people are powerless relative to the state, but when AI is good enough to create their own nation-states, value conflicts become much more practical, and the basis for a lot of cooperative behavior collapses:
and also means the level of alignment of AI needs to be closer to the fictional benevolent angels than it is to humans in relationship to other humans, so it motivates a more ambitious version of the alignment objectives than making AIs merely not break the law or steal from humans.
I’m actually reasonably hopeful the more ambitious versions of alignment are possible, and think there’s a realistic chance we can actually do them.
In retrospect, I was basically a bit too optimistic about this working out, and a big part of why is I didn’t truly grasp how deep value conflicts can be even amongst humans, and I’m now much more skeptical on multi-alignment schemes working because I believe a lot of alignment is broadly because people are powerless relative to the state, but when AI is good enough to create their own nation-states, value conflicts become much more practical, and the basis for a lot of cooperative behavior collapses: