I think MAIM might only convince people who have p(doom) < 1%.
If we’re at the point that we can convincingly say to each other “this AGI we’re building together can not be used to harm you” we are way closer to p(doom) == 0 than we are right now, IMHO.
Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all? The risk is “anyone gets AGI” until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn’t perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it’s likely to encompass 99.9% of what other human blobs care about.
Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other. And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.
I think MAIM might only convince people who have p(doom) < 1%.
If we’re at the point that we can convincingly say to each other “this AGI we’re building together can not be used to harm you” we are way closer to p(doom) == 0 than we are right now, IMHO.
Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all? The risk is “anyone gets AGI” until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn’t perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it’s likely to encompass 99.9% of what other human blobs care about.
Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other. And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.