I like your idea that economic incentives will become the safety bottleneck more so than corrigibility. Many would argue that a pure reasoner actually can influence the world through e.g. manipulation, but this doesn’t seem very realistic to me if the model is memoryless and doesn’t have the ability to recursively ask itself new questions.
Adding such capabilities is fairly easy, however. Which is what your concern is about.
I like your idea that economic incentives will become the safety bottleneck more so than corrigibility. Many would argue that a pure reasoner actually can influence the world through e.g. manipulation, but this doesn’t seem very realistic to me if the model is memoryless and doesn’t have the ability to recursively ask itself new questions.
Adding such capabilities is fairly easy, however. Which is what your concern is about.