Rohin Shah comments on Ngo and Yudkowsky on alignment difficulty

Rohin Shah 28 Nov 2021 16:52 UTC
LW: 6 AF: 5
AF
The things AI systems today can do are already hitting pretty narrow targets. E.g., generating English text that is coherent is not something you’d expect from a random neural network. Why is corrigibility so much more of a narrow target than that? (I think Rohin may have said this to me at some point.)
I’ll note that this is framed a bit too favorably to me, the actual question is “why is an effective and corrigible system so much more of a narrow target than that?”