But I think it’s more subtle than most people think. You hear a lot of people talk about AI capabilities and AI alignment as in orthogonal vectors. You’re bad if you’re a capabilities researcher and you’re good if you’re an alignment researcher. It actually sounds very reasonable, but they’re almost the same thing. Deep learning is just gonna solve all of these problems and so far that’s what the progress has been. And progress on capabilities is also what has let us make the systems safer and vice versa surprisingly. So I think none of the sort of sound-bite easy answers work.
Pointing this out, since I don’t nearly agree with this, IMO, at least not strongly enough that capabilities and safety are the same thing. Also, I note a motivated reasoning alert here, since this is what someone would write to make sure that their beliefs around AI capabilities are good is reinforced, since the inconvenient world where the Orthogonality Thesis and instrumental convergence is true would be personally disastrous for OpenAI.
Pointing this out, since I don’t nearly agree with this, IMO, at least not strongly enough that capabilities and safety are the same thing. Also, I note a motivated reasoning alert here, since this is what someone would write to make sure that their beliefs around AI capabilities are good is reinforced, since the inconvenient world where the Orthogonality Thesis and instrumental convergence is true would be personally disastrous for OpenAI.