If you think that the first AGI crossing some capability threshold is likely to suddenly takeover the world, then the species level alignment analogy breaks down and doom is more likely. It would be like a single medieval era human suddenly taking over the world via powerful magic. Would the resulting world after optimization according to that single human’s desires still score reasonably well at IGF? I’d say somewhere between 90% to 50% probability, but that’s still clearly a high p(doom) scenario
So it’d be a random draw with a fairly high p(doom), so no not a success in expectation relative to the futures I expect.
In actuality I expect the situation to be more multipolar, and thus more robust due to utility function averaging. If power is distributed over N agents each with a utility function that is variably but even slightly aligned to humanity in expectation, that converges with increasing N to full alignment at the population level[1].
As we expect the divergence in agent utility functions to all be from forms of convergent selfish empowerment, which are necessarily not aligned with each other (ie none of the AIs are inter aligned except through variable partial alignment to humanity).
I actually answered that towards the end:
So it’d be a random draw with a fairly high p(doom), so no not a success in expectation relative to the futures I expect.
In actuality I expect the situation to be more multipolar, and thus more robust due to utility function averaging. If power is distributed over N agents each with a utility function that is variably but even slightly aligned to humanity in expectation, that converges with increasing N to full alignment at the population level[1].
As we expect the divergence in agent utility functions to all be from forms of convergent selfish empowerment, which are necessarily not aligned with each other (ie none of the AIs are inter aligned except through variable partial alignment to humanity).