Yeah, I think the hope of us fully understanding a learned system was a fool’s errand, and the dream of full interpretability was never actually possible (because of the very complicated sequences of the world and the fact that indexical complexity means your brain needs to be even more complicated, and Shane Legg proved that for Turing-computable learners, you can only predict/act on complex sequences by being that complex yourself).
To be frank, I think @Shane_Legg’s paper predicted a lot of the reason why MIRI’s efforts didn’t work, because in practice, a computable theory of learning was just way more complicated than people thought at the time, and it turned out there was no clever shortcut, and the sorts of things that are easy to white-box are also the things that we can’t get because they aren’t computable by a Turing Machine.
More generally, one of the flaws in hindsight of early LW work, especially before 2012-2013, was not realizing that their attempts to relax the problem by introducing hypercomputers didn’t work to give us new ideas, and the relaxed problem had no relation to the real problem of making AI safe as AI progresses in this world, such that solutions for one problem fail to transfer to the other problem.
Here’s your citation, @Steven Byrnes for the claim that for Turing-computable learners, you can only predict/act on complex sequences by being that complex yourself.
Is there an Elegant Universal Theory of Prediction? Shane Legg (2006):
The immediate corollary is as AI gets better, it’s going to be inevitably more and more complicated by default, and it’s not going to be any easier to interpret AIs, it will just get harder and harder to interpret the learned parts of the AI.
Yeah, I think the hope of us fully understanding a learned system was a fool’s errand, and the dream of full interpretability was never actually possible (because of the very complicated sequences of the world and the fact that indexical complexity means your brain needs to be even more complicated, and Shane Legg proved that for Turing-computable learners, you can only predict/act on complex sequences by being that complex yourself).
To be frank, I think @Shane_Legg’s paper predicted a lot of the reason why MIRI’s efforts didn’t work, because in practice, a computable theory of learning was just way more complicated than people thought at the time, and it turned out there was no clever shortcut, and the sorts of things that are easy to white-box are also the things that we can’t get because they aren’t computable by a Turing Machine.
More generally, one of the flaws in hindsight of early LW work, especially before 2012-2013, was not realizing that their attempts to relax the problem by introducing hypercomputers didn’t work to give us new ideas, and the relaxed problem had no relation to the real problem of making AI safe as AI progresses in this world, such that solutions for one problem fail to transfer to the other problem.
Here’s your citation, @Steven Byrnes for the claim that for Turing-computable learners, you can only predict/act on complex sequences by being that complex yourself.
Is there an Elegant Universal Theory of Prediction? Shane Legg (2006):
https://arxiv.org/abs/cs/0606070
The immediate corollary is as AI gets better, it’s going to be inevitably more and more complicated by default, and it’s not going to be any easier to interpret AIs, it will just get harder and harder to interpret the learned parts of the AI.