I definitely agree on thinking deeply on how relevant different approaches are, but I think framing it as “relevance over tractability” is weird because tractability makes something relevant.
Maybe Deep learning neural networks really are too difficult to align and we won’t solve the technical alignment problem by working on them, but interpretability could still be highly relevant by proving that this is the case. We currently do not have hard information to rule out AGI from scaling up transformers, nor that it’s impossible to align for self modification (though uh, I wouldn’t count on it being possible, I would recommend not allowing self modification etc). More information is useful if it helps activates the breaks on developing these dangerous AI. In the world where all AI alignment people are convinced DL is dangerous so work on other things, then the the world gets destroyed by DL people who were never shown that it’s dangerous. The universe won’t guarantee other people will be reasonable, so AI safety folk have all responsibility of proving danger etc.
I definitely agree on thinking deeply on how relevant different approaches are, but I think framing it as “relevance over tractability” is weird because tractability makes something relevant.
Maybe Deep learning neural networks really are too difficult to align and we won’t solve the technical alignment problem by working on them, but interpretability could still be highly relevant by proving that this is the case. We currently do not have hard information to rule out AGI from scaling up transformers, nor that it’s impossible to align for self modification (though uh, I wouldn’t count on it being possible, I would recommend not allowing self modification etc). More information is useful if it helps activates the breaks on developing these dangerous AI.
In the world where all AI alignment people are convinced DL is dangerous so work on other things, then the the world gets destroyed by DL people who were never shown that it’s dangerous. The universe won’t guarantee other people will be reasonable, so AI safety folk have all responsibility of proving danger etc.