One more reason why iterative design could fail is if we build AI systems with low corrigibility. If we build a misaligned AI with low corrigibility that isn’t doing what we want, we might have difficulty shutting it down or changing its goal. I think that’s one of the reasons why Yudkowsky believes we have to get alignment right on the first try.
One more reason why iterative design could fail is if we build AI systems with low corrigibility. If we build a misaligned AI with low corrigibility that isn’t doing what we want, we might have difficulty shutting it down or changing its goal. I think that’s one of the reasons why Yudkowsky believes we have to get alignment right on the first try.