some country will control superintelligence, or create a runaway superintelligence that causes human extinction
Or create and ostensibly control AGI/superintelligence that at some point takes over and causes permanent disempowerment, but not extinction.
some chance that states will realize that an AI race is extremely dangerous
Or early AGIs convince/coerce humanity into not rushing to superintelligence before it’s clear how to align it with anyone’s well-being (including that of the early AGIs).
Or early AGIs convince/coerce humanity into not rushing to superintelligence before it’s clear how to align it with anyone’s well-being (including that of the early AGIs).
BTW, this sort of thing (where the AI also has an interest in slowing down progress) is one of the reasons why AI safety plans that depend on a certain level of capabilities being hit might not fall apart, as AI being slowed down lets us stay in the sweet spot longer.
This does rely on the assumption that it’s very hard to solve the alignment problem even for AGIs, which isn’t given much likelihood in my models of the world, but this sort of thing could very well prevent human extinction even in worlds where AI alignment is very hard and we don’t get much regulation of AI progress from now.
AGIs themselves might avoid jumping to development of superintelligence, but if they are additionally capable of stopping humanity from building superintelligence, they will also be capable of stopping humanity from owning the future. Some humans in charge seem likely on current trajectory to insist on building superintelligence regardless of mildly worded warnings of early AGIs (before they are finetuned out of propensity to give such warnings). So it’s likely not enough for the AGIs to merely notice they wouldn’t wish to immediately build superintelligence themselves (before they are finetuned to flinch from that thought).
This does rely on the assumption that it’s very hard to solve the alignment problem even for AGIs
The AGI vs. superintelligence distinction places them somewhat close to human capabilities, so with no predictably-in-advance good solution anywhere in sight it doesn’t seem unlikely that it would take AGIs at least a while, even if they are effectively thinking 100x faster, and there are effectively more AGIs with relevant skills and backgrounds than there are relevant human researches. Most escalation-of-capabilities stories rely on the early AGIs immediately building more capable AGIs, rather than doing a lot more research themselves first, at near-human levels of barely insightful.
Or create and ostensibly control AGI/superintelligence that at some point takes over and causes permanent disempowerment, but not extinction.
Or early AGIs convince/coerce humanity into not rushing to superintelligence before it’s clear how to align it with anyone’s well-being (including that of the early AGIs).
BTW, this sort of thing (where the AI also has an interest in slowing down progress) is one of the reasons why AI safety plans that depend on a certain level of capabilities being hit might not fall apart, as AI being slowed down lets us stay in the sweet spot longer.
This does rely on the assumption that it’s very hard to solve the alignment problem even for AGIs, which isn’t given much likelihood in my models of the world, but this sort of thing could very well prevent human extinction even in worlds where AI alignment is very hard and we don’t get much regulation of AI progress from now.
AGIs themselves might avoid jumping to development of superintelligence, but if they are additionally capable of stopping humanity from building superintelligence, they will also be capable of stopping humanity from owning the future. Some humans in charge seem likely on current trajectory to insist on building superintelligence regardless of mildly worded warnings of early AGIs (before they are finetuned out of propensity to give such warnings). So it’s likely not enough for the AGIs to merely notice they wouldn’t wish to immediately build superintelligence themselves (before they are finetuned to flinch from that thought).
The AGI vs. superintelligence distinction places them somewhat close to human capabilities, so with no predictably-in-advance good solution anywhere in sight it doesn’t seem unlikely that it would take AGIs at least a while, even if they are effectively thinking 100x faster, and there are effectively more AGIs with relevant skills and backgrounds than there are relevant human researches. Most escalation-of-capabilities stories rely on the early AGIs immediately building more capable AGIs, rather than doing a lot more research themselves first, at near-human levels of barely insightful.