The things AI systems today can do are already hitting pretty narrow targets. E.g., generating English text that is coherent is not something you’d expect from a random neural network. Why is corrigibility so much more of a narrow target than that? (I think Rohin may have said this to me at some point.)
I’ll note that this is framed a bit too favorably to me, the actual question is “why is an effective and corrigible system so much more of a narrow target than that?”
I’ll note that this is framed a bit too favorably to me, the actual question is “why is an effective and corrigible system so much more of a narrow target than that?”