I think the idea behind MAIM is to make it so neither China nor the US can build superintelligence without at least implicit consent from the other. This is before we get to the possibility of first strikes.
If you suspect an enemy state is about to build a superintelligence which they will then use to destroy you (or that will destroy everyone), you MAIM it. You succeed in MAIMing it because everyone agreed to measures making it really easy to MAIM it. Therefore, for either side to build superintelligence, there must be a general agreement to do so. If there’s a general agreement that’s trusted by all sides, then it’s substantially more likely superintelligence isn’t used to perform first strikes (and that it doesn’t kill everyone), because who would agree without strong guarantees against that?
(Unfortunately, while Humanity does have experience with control of dual-use nuclear technology, the dual uses of superintelligence are way more tightly intertwined—you can’t as easily prove “hey this is is just a civilian nuclear reactor, we’re not making weapons-grade stuff here”. But an attempt is perhaps worthwhile.)
I think MAIM might only convince people who have p(doom) < 1%.
If we’re at the point that we can convincingly say to each other “this AGI we’re building together can not be used to harm you” we are way closer to p(doom) == 0 than we are right now, IMHO.
Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all? The risk is “anyone gets AGI” until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn’t perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it’s likely to encompass 99.9% of what other human blobs care about.
Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other. And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.
I think the idea behind MAIM is to make it so neither China nor the US can build superintelligence without at least implicit consent from the other. This is before we get to the possibility of first strikes.
If you suspect an enemy state is about to build a superintelligence which they will then use to destroy you (or that will destroy everyone), you MAIM it. You succeed in MAIMing it because everyone agreed to measures making it really easy to MAIM it. Therefore, for either side to build superintelligence, there must be a general agreement to do so. If there’s a general agreement that’s trusted by all sides, then it’s substantially more likely superintelligence isn’t used to perform first strikes (and that it doesn’t kill everyone), because who would agree without strong guarantees against that?
(Unfortunately, while Humanity does have experience with control of dual-use nuclear technology, the dual uses of superintelligence are way more tightly intertwined—you can’t as easily prove “hey this is is just a civilian nuclear reactor, we’re not making weapons-grade stuff here”. But an attempt is perhaps worthwhile.)
I think MAIM might only convince people who have p(doom) < 1%.
If we’re at the point that we can convincingly say to each other “this AGI we’re building together can not be used to harm you” we are way closer to p(doom) == 0 than we are right now, IMHO.
Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all? The risk is “anyone gets AGI” until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn’t perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it’s likely to encompass 99.9% of what other human blobs care about.
Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other. And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.