I have significant misgivings about the comparison with MAD which relies on overwhelming destructive response being available and thus renders a debilitating first-strike being unavailable.
With AGI a first strike seems both likely to succeed and predicted in advance by several folks in several ways (full takeover, pivotal act, singleton outcome) whereas only a few (Von Neumann) argued for a first strike before the USSR obtained nuclear weapons, with no arguments I am aware of after they did.
If an AGI takeover is likely to trigger MAD itself then that is a separate and potentially interesting line of reasoning, but I don’t see the inherent teeth in MAIM. If countries are in a cold war rush to AGI then the most well-funded and covert attempt will achieve AGI first and likely initiate a first strike that circumvents MAD itself through new technological capabilities.
I think the idea behind MAIM is to make it so neither China nor the US can build superintelligence without at least implicit consent from the other. This is before we get to the possibility of first strikes.
If you suspect an enemy state is about to build a superintelligence which they will then use to destroy you (or that will destroy everyone), you MAIM it. You succeed in MAIMing it because everyone agreed to measures making it really easy to MAIM it. Therefore, for either side to build superintelligence, there must be a general agreement to do so. If there’s a general agreement that’s trusted by all sides, then it’s substantially more likely superintelligence isn’t used to perform first strikes (and that it doesn’t kill everyone), because who would agree without strong guarantees against that?
(Unfortunately, while Humanity does have experience with control of dual-use nuclear technology, the dual uses of superintelligence are way more tightly intertwined—you can’t as easily prove “hey this is is just a civilian nuclear reactor, we’re not making weapons-grade stuff here”. But an attempt is perhaps worthwhile.)
I think MAIM might only convince people who have p(doom) < 1%.
If we’re at the point that we can convincingly say to each other “this AGI we’re building together can not be used to harm you” we are way closer to p(doom) == 0 than we are right now, IMHO.
Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all? The risk is “anyone gets AGI” until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn’t perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it’s likely to encompass 99.9% of what other human blobs care about.
Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other. And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.
I have significant misgivings about the comparison with MAD which relies on overwhelming destructive response being available and thus renders a debilitating first-strike being unavailable.
With AGI a first strike seems both likely to succeed and predicted in advance by several folks in several ways (full takeover, pivotal act, singleton outcome) whereas only a few (Von Neumann) argued for a first strike before the USSR obtained nuclear weapons, with no arguments I am aware of after they did.
If an AGI takeover is likely to trigger MAD itself then that is a separate and potentially interesting line of reasoning, but I don’t see the inherent teeth in MAIM. If countries are in a cold war rush to AGI then the most well-funded and covert attempt will achieve AGI first and likely initiate a first strike that circumvents MAD itself through new technological capabilities.
I think the idea behind MAIM is to make it so neither China nor the US can build superintelligence without at least implicit consent from the other. This is before we get to the possibility of first strikes.
If you suspect an enemy state is about to build a superintelligence which they will then use to destroy you (or that will destroy everyone), you MAIM it. You succeed in MAIMing it because everyone agreed to measures making it really easy to MAIM it. Therefore, for either side to build superintelligence, there must be a general agreement to do so. If there’s a general agreement that’s trusted by all sides, then it’s substantially more likely superintelligence isn’t used to perform first strikes (and that it doesn’t kill everyone), because who would agree without strong guarantees against that?
(Unfortunately, while Humanity does have experience with control of dual-use nuclear technology, the dual uses of superintelligence are way more tightly intertwined—you can’t as easily prove “hey this is is just a civilian nuclear reactor, we’re not making weapons-grade stuff here”. But an attempt is perhaps worthwhile.)
I think MAIM might only convince people who have p(doom) < 1%.
If we’re at the point that we can convincingly say to each other “this AGI we’re building together can not be used to harm you” we are way closer to p(doom) == 0 than we are right now, IMHO.
Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all? The risk is “anyone gets AGI” until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn’t perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it’s likely to encompass 99.9% of what other human blobs care about.
Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other. And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.