I do think that alignment solutions which try to solve value alignment have more of a chance of causing s-risks than those which solve corrigibility. In particular because if you get the AI to care about the same things humans value, this is pretty close to getting the AI to actively dislike things that humans value, and if there’s even one component of human values which is pessimized, this seems extremely bad even if the rest of the parts are optimized.
I do think that alignment solutions which try to solve value alignment have more of a chance of causing s-risks than those which solve corrigibility. In particular because if you get the AI to care about the same things humans value, this is pretty close to getting the AI to actively dislike things that humans value, and if there’s even one component of human values which is pessimized, this seems extremely bad even if the rest of the parts are optimized.