I agree with Eliezer that the main difficulty is (i). I’m less convinced about the relevance of stable self-modification to (i).
I gave “ontological crises” and “stopping to reason in terms of a Cartesian boundary” as two examples of problems that seem like they’re going to make the system reason in ways that don’t get tested by having it prove theorems, and as far as I understand, this made Eliezer shift significantly.
It seems like these are subsumed by “makes good predictions about the world,” and in particular making good predictions about AIs that you run. Yes, a system might make good predictions in some contexts but not others, and that might be especially bad here, but I don’t think it’s especially likely (and I think it can be avoided using usual techniques). One disagreement is that I don’t see such a distinguished role for formal proofs.
I agree with Eliezer that the main difficulty is (i). I’m less convinced about the relevance of stable self-modification to (i).
It seems like these are subsumed by “makes good predictions about the world,” and in particular making good predictions about AIs that you run. Yes, a system might make good predictions in some contexts but not others, and that might be especially bad here, but I don’t think it’s especially likely (and I think it can be avoided using usual techniques). One disagreement is that I don’t see such a distinguished role for formal proofs.