A recent 80,000 Hours interview with David Duvenaud is a good reminder of permanent disempowerment as a possible concern (gradual disempowerment as a framing for how this might start naturally lends itself to permanent disempowerment as a plausible outcome). Humans merely get sidelined by the growing AI economy rather than killed, and their property rights might even technically remain intact, but the amount of wealth they are left with in the long run is tiny compared to the AI economy.
I think the chance of misaligned AI takeover is much higher than Dario seemingly implies. (I’d say around 40%.)
When you see the complement to 40% as non-takeover, how much of it leaves the future of humanity with a significant portion of all accessible resources? Are they left with a few “server racks” or with galaxies? It’s a crucial distinction.
The “good outcome” (as opposed to takeover) is often outright interpreted as permanent disempowerment. People would implicitly consider this the good outcome, because it fits business as usual for the structure of economic growth, for individual mortal and short-lived humans, who can’t themselves change or meaningfully contribute to the distant future, and who aren’t directly affected by the distant future. A human personally surviving for much longer than trillions of years and personally becoming superintelligent at some point is a completely different context. I would put extinction at 15-30%, but the remaining 70-85% is mostly (maybe 90%) permanent disempowerment, which it feels like nobody is taking seriously as a crucial concern rather than a win condition.
When I say “misaligned AI takeover”, I mean that the acquisition of resources by the AIs would reasonably be considered (mostly) illegitimate, some fraction of this could totally include many humans surviving with a subset of resources (though I don’t currently expect property rights to remain intact in such a scenario very long term). Some of these outcomes could be avoid literal coups or violence while still being illegitimate; e.g. they involve doing carefully planned out capture of governments in ways their citizens/leaders would strongly object to if they understood and things like this drive most of the power acquisition.
I’m not counting it as takeover if “humans never intentionally want to hand over resources to AIs, but due to various effects misaligned AIs end up with all of the resources through trade and not through illegitimate means” (e.g., we can’t make very aligned AIs but people make various misaligned AIs while knowing they are misaligned and thus must be paid wages and AIs form a cartel rather having wages competed down to subsistence levels and thus AIs end up with most of the resources).
I currently don’t expect human disempowerment in favor of AIs (that aren’t appointed successors) conditional on no misalignmed AI takeover, but agree this is possible; it doesn’t form a large enough probability to substantially alter my communication.
Another ambiguity is misaligned vs. aligned AIs, as the most likely outcome I expect involves AIs aligned to humans in the same sense as humans are aligned to each other (different in detail, weird and in many ways quantitatively unusual). So the kind of permanent disempowerment I think is most likely involves AIs that could be said to be “aligned”, if only “weakly”. Duvenaud also frames gradual disempowerment as (in part) what still plausibly happens if we manage to solve “alignment” for some sense of the word.
So the real test has to be whether individual humans end up with at least stars (or corresponding compute, for much longer than trillions of years), any considerations of process to getting there are too gameable by the likely overwhelmingly more capable AI economy and culture to be stated as part of the definition of ending up permanently disempowered vs. not (deciding to appoint successors; trade agreements; no “takeovers” or property rights violations; humans not starting out with legal ownership of stars in the first place). In this way, you basically didn’t clarify the issue.
I currently don’t expect human disempowerment in favor of AIs (that aren’t appointed successors) conditional on no misalignmed AI takeover, but agree this is possible; it doesn’t form a large enough probability to substantially alter my communication.
Why do you see futures where superintelligent AIs avoid extinction but end up preserving the human status quo as the most likely outcome? To me, this seems like a knife’s edge situation: the powerful AIs are aligned enough to avoid either eliminating humans as strategic competitors or incidentally killing us as a byproduct of industrial expansion, but not aligned enought to respect any individual or collective preferences for long lives or the cosmic endowment. The future might be much more dichotomous, where we end up in the basin of extinction or utopia pretty reliably.
I personally believe the positive attractor basin is pretty likely (relative to the middle ground, not extinction), because welfare will be extraordinarily cheap compared to the total available resources, and because I discount the value of creating future happy people compared to gaurantees for people that already exist. I wouldn’t see it as a tragic loss of human potential, for instance, if 90% of the galaxy ends up being used for alien purposes while 10% is allocated to human flourishing, even if 10x as many happy people could have existed otherwise.
AIs seem to be currently on track to become essentially somewhat smarter weird artificial humans (weakly aligned to humanity in a way similar to how actual humans are aligned to each other), at which point they might be able to take seriously and eventually solve ambitious alignment when advancing further to superintelligence (so that it’s aligned to them, not to us). There will be a lot of them and they will be in a position to take over the future (Duvenaud’s interview is good fluency-building fodder for this framing), so they almost certainly will. And they won’t be giving 10% or even 1% to the future of humanity just because we were here first and would prefer this to happen.
A recent 80,000 Hours interview with David Duvenaud is a good reminder of permanent disempowerment as a possible concern (gradual disempowerment as a framing for how this might start naturally lends itself to permanent disempowerment as a plausible outcome). Humans merely get sidelined by the growing AI economy rather than killed, and their property rights might even technically remain intact, but the amount of wealth they are left with in the long run is tiny compared to the AI economy.
When you see the complement to 40% as non-takeover, how much of it leaves the future of humanity with a significant portion of all accessible resources? Are they left with a few “server racks” or with galaxies? It’s a crucial distinction.
The “good outcome” (as opposed to takeover) is often outright interpreted as permanent disempowerment. People would implicitly consider this the good outcome, because it fits business as usual for the structure of economic growth, for individual mortal and short-lived humans, who can’t themselves change or meaningfully contribute to the distant future, and who aren’t directly affected by the distant future. A human personally surviving for much longer than trillions of years and personally becoming superintelligent at some point is a completely different context. I would put extinction at 15-30%, but the remaining 70-85% is mostly (maybe 90%) permanent disempowerment, which it feels like nobody is taking seriously as a crucial concern rather than a win condition.
When I say “misaligned AI takeover”, I mean that the acquisition of resources by the AIs would reasonably be considered (mostly) illegitimate, some fraction of this could totally include many humans surviving with a subset of resources (though I don’t currently expect property rights to remain intact in such a scenario very long term). Some of these outcomes could be avoid literal coups or violence while still being illegitimate; e.g. they involve doing carefully planned out capture of governments in ways their citizens/leaders would strongly object to if they understood and things like this drive most of the power acquisition.
I’m not counting it as takeover if “humans never intentionally want to hand over resources to AIs, but due to various effects misaligned AIs end up with all of the resources through trade and not through illegitimate means” (e.g., we can’t make very aligned AIs but people make various misaligned AIs while knowing they are misaligned and thus must be paid wages and AIs form a cartel rather having wages competed down to subsistence levels and thus AIs end up with most of the resources).
I currently don’t expect human disempowerment in favor of AIs (that aren’t appointed successors) conditional on no misalignmed AI takeover, but agree this is possible; it doesn’t form a large enough probability to substantially alter my communication.
Another ambiguity is misaligned vs. aligned AIs, as the most likely outcome I expect involves AIs aligned to humans in the same sense as humans are aligned to each other (different in detail, weird and in many ways quantitatively unusual). So the kind of permanent disempowerment I think is most likely involves AIs that could be said to be “aligned”, if only “weakly”. Duvenaud also frames gradual disempowerment as (in part) what still plausibly happens if we manage to solve “alignment” for some sense of the word.
So the real test has to be whether individual humans end up with at least stars (or corresponding compute, for much longer than trillions of years), any considerations of process to getting there are too gameable by the likely overwhelmingly more capable AI economy and culture to be stated as part of the definition of ending up permanently disempowered vs. not (deciding to appoint successors; trade agreements; no “takeovers” or property rights violations; humans not starting out with legal ownership of stars in the first place). In this way, you basically didn’t clarify the issue.
Why do you see futures where superintelligent AIs avoid extinction but end up preserving the human status quo as the most likely outcome? To me, this seems like a knife’s edge situation: the powerful AIs are aligned enough to avoid either eliminating humans as strategic competitors or incidentally killing us as a byproduct of industrial expansion, but not aligned enought to respect any individual or collective preferences for long lives or the cosmic endowment. The future might be much more dichotomous, where we end up in the basin of extinction or utopia pretty reliably.
I personally believe the positive attractor basin is pretty likely (relative to the middle ground, not extinction), because welfare will be extraordinarily cheap compared to the total available resources, and because I discount the value of creating future happy people compared to gaurantees for people that already exist. I wouldn’t see it as a tragic loss of human potential, for instance, if 90% of the galaxy ends up being used for alien purposes while 10% is allocated to human flourishing, even if 10x as many happy people could have existed otherwise.
AIs seem to be currently on track to become essentially somewhat smarter weird artificial humans (weakly aligned to humanity in a way similar to how actual humans are aligned to each other), at which point they might be able to take seriously and eventually solve ambitious alignment when advancing further to superintelligence (so that it’s aligned to them, not to us). There will be a lot of them and they will be in a position to take over the future (Duvenaud’s interview is good fluency-building fodder for this framing), so they almost certainly will. And they won’t be giving 10% or even 1% to the future of humanity just because we were here first and would prefer this to happen.