Why do so many Rationalists assign a negligible probability to unaligned AI wiping itself out before it wipes humanity out?
What if it becomes incredibly powerful before it becomes intelligent enough to not make existential mistakes? (The obvious analogy being: If we’re so certain that human wisdom can’t keep up with human power, why is AI any different? Or even: If we’re so certain that humans will wipe themselves out before they wipe out monkeys, why is AI any different?)
I’m imagining something like: In a bid to gain a decisive strategic advantage over humans and aligned AIs, an unaligned AI amasses an astonishing amount of power, then messes up somewhere (like AlphaGo making a weird, self-destructive move, or humans failing at coordination and nearly nuking each other), and ends up permanently destroying its code and backups and maybe even melting all GPUs and probably taking half the planet with it, but enough humans survive to continue/rebuild civilisation. And maybe it’s even the case that hundreds of years later, we’ve made AI again, and an unaligned AI messes up again, and the cycle repeats itself potentially many, many times because in practice it turns out humans always put up a good fight and it’s really hard to kill them all off without AI killing itself first.
Or this scenario considered doom? (Because we need superintelligent AI in order to spread to the stars?)
(Inspired by Paul’s reasoning here: “Most importantly, it seems like AI systems have huge structural advantages (like their high speed and low cost) that suggest they will have a transformative impact on the world (and obsolete human contributions to alignment retracted) well before they need to develop superhuman understanding of much of the world or tricks about how to think, and so even if they have a very different profile of abilities to humans they may still be subhuman in many important ways.” and similar to his thoughts here: “One way of looking at this is that Eliezer is appropriately open-minded about existential quantifiers applied to future AI systems thinking about how to cause trouble, but seems to treat existential quantifiers applied to future humans in a qualitatively rather than quantitatively different way (and as described throughout this list I think he overestimates the quantitative difference).”)
Anonymous question (ask here) :
Why do so many Rationalists assign a negligible probability to unaligned AI wiping itself out before it wipes humanity out?
What if it becomes incredibly powerful before it becomes intelligent enough to not make existential mistakes? (The obvious analogy being: If we’re so certain that human wisdom can’t keep up with human power, why is AI any different? Or even: If we’re so certain that humans will wipe themselves out before they wipe out monkeys, why is AI any different?)
I’m imagining something like: In a bid to gain a decisive strategic advantage over humans and aligned AIs, an unaligned AI amasses an astonishing amount of power, then messes up somewhere (like AlphaGo making a weird, self-destructive move, or humans failing at coordination and nearly nuking each other), and ends up permanently destroying its code and backups and maybe even melting all GPUs and probably taking half the planet with it, but enough humans survive to continue/rebuild civilisation. And maybe it’s even the case that hundreds of years later, we’ve made AI again, and an unaligned AI messes up again, and the cycle repeats itself potentially many, many times because in practice it turns out humans always put up a good fight and it’s really hard to kill them all off without AI killing itself first.
Or this scenario considered doom? (Because we need superintelligent AI in order to spread to the stars?)
(Inspired by Paul’s reasoning here: “Most importantly, it seems like AI systems have huge structural advantages (like their high speed and low cost) that suggest they will have a transformative impact on the world (and obsolete human contributions to alignment retracted) well before they need to develop superhuman understanding of much of the world or tricks about how to think, and so even if they have a very different profile of abilities to humans they may still be subhuman in many important ways.” and similar to his thoughts here: “One way of looking at this is that Eliezer is appropriately open-minded about existential quantifiers applied to future AI systems thinking about how to cause trouble, but seems to treat existential quantifiers applied to future humans in a qualitatively rather than quantitatively different way (and as described throughout this list I think he overestimates the quantitative difference).”)