Stephen McAleese comments on Worlds Where Iterative Design Fails

Stephen McAleese 27 May 2023 14:53 UTC
3 points
2
One more reason why iterative design could fail is if we build AI systems with low corrigibility. If we build a misaligned AI with low corrigibility that isn’t doing what we want, we might have difficulty shutting it down or changing its goal. I think that’s one of the reasons why Yudkowsky believes we have to get alignment right on the first try.