I agree with the core message in Dario Amodei’s essay “The Adolescence of Technology”: AI is an epochal technology that poses massive risks and humanity isn’t clearly going to do a good job managing these risks.
(Context for LessWrong: I think it seems generally useful to comment on things like this. I expect that many typical LessWrong readers will agree with me and find my views relatively predictable, but I thought it would be good to post here anyway.)
However, I also disagree with (or dislike) substantial parts of this essay:
I think it’s reasonable to think that the chance of AI takeover is very high and bad that Dario seemingly dismisses this as “doomerism” and “thinking about AI risks in a quasi-religious way”. It’s important to not conflate between “there are unreasonable sensationalist people who think risks are very high” and the very idea that risk is very high. (Same as how Dario doesn’t want to be associated with everyone arguing that risk is relatively lower, including people doing so in a more sensationalist way.)
I agree that there are unreasonable and sensationalist people arguing risks are very high. I think focusing on weak men is dangerous, especially when seemingly trying (in part) to reassure the world that your actions are reasonable / sufficiently safe.
I think the chance of misaligned AI takeover is much higher than Dario seemingly implies. (I’d say around 40%.) In large part, I think this because I don’t think currently available methods would work on AI systems that are wildly, wildly superhuman and we might quickly reach such systems as part of AIs automating AI R&D.
Dario’s arguments seemingly don’t engage with the idea that AIs might get wildly more capable than humans very rapidly.
I think wildly superhuman (and competitive) systems are much more likely (than much weaker AIs) to be extremely effective consequentialists that are very good at pursuing long range goals and it might be hard to instill the relevant motivations that prevent takeover and/or actually instill the right long range goals. These AIs are also straightforwardly hard to supervise/correct due to being very superhuman: they may often be taking extremely illegible actions that humans can’t understand even after substantial investigation.
By default, training wildly, wildly superhuman AIs with literally the current level of controllability/steerability we have now seems very likely to result in AI takeover (or at least misaligned AI systems ending up with most of the power in the long run). However, we may be able to use AI systems along the way to automate safety research such that our methods also radically improve by the time we reach this point. From my understanding, this is the main hope of AI companies that are thinking about these difficulties at all. Unfortunately, there are various reasons why this could fail.
Dario seems to dismiss the idea that training wildly superhuman AIs within a few years (that run the whole economy) would probably lead to AI takeover without major alignment advances. This dismissal seems unsubstantiated and doesn’t engage with the arguments on the other side.
Dario generally seems overly myopic (focused on current levels of capabilities and methods) while also seemingly acknowledging that capabilities will advance rapidly (including at least to the point where AIs surpass humans). He seems to over update on evidence about current models and generalize that to a regime where we are having possibly many years (decades?) of AI (software) progress in a year driven by AIs automating AI R&D. This could easily yield very different paradigms and mean that empirical evidence about current models is only weakly related to what we see later.
I wish Dario tried harder to engage with the views of people who think risk is much higher (e.g., the views of employees on the Anthropic alignment science team who think this). He often seems to argue against simplified pictures of a case for very high risk and then conclude risk is low[1].
I wish Dario more clearly distinguished between what he thinks a reasonable government should do given his understanding of the situation and what he thinks should happen given limited political will. I’d guess Dario thinks that very strong government action would be justified without further evidence of risk (but perhaps with evidence of capabilities) if there was high political will for action (reducing backlash risks).
This is referencing “As more specific and actionable evidence of risks emerges (if it does), future legislation over the coming years can be surgically focused on the precise and well-substantiated direction of risks, minimizing collateral damage. To be clear, if truly strong evidence of risks emerges, then rules should be proportionately strong.”
We should distinguish between legible evidence that will cause action (without as much backlash) and what actors should do if they were reasonable.
Dario doesn’t clearly explain what “autonomy risks” and “destroying humanity” entails: many, most, or all humans being killed and humanity losing control over the future. I think it’s important to emphasize the severity of outcomes and I think people skimming the essay may not realize exactly what Dario thinks is at stake. A substantial possibility of the majority of humans being killed should be jarring.
There are some complications and caveats, but I think violent AI takeover that kills most humans is a central bad outcome.
Dario strongly implies that Anthropic “has this covered” and wouldn’t be imposing a massively unreasonable amount of risk if Anthropic proceeded as the leading AI company with a small buffer to spend on building powerful AI more carefully. I do not think Anthropic has this covered and in an (optimistic for Anthropic) world where Anthropic had a 3 month lead I think the chance of AI takeover would be high, perhaps around 20%. With a 2 or 3 year lead spent well on safety, I think risk would be substantially lower, but I don’t see this as being likely to occur. (As I’d guess Dario would agree.)
I think it’s unhealthy and bad for AI companies to give off a “we have this covered and will do a good job” vibe if they actually believe that even if they were in the lead, risk would be very high. At the very least, I expect many employees at Anthropic working on alignment, safety, and security don’t believe Anthropic has the situation covered.
Dario says “If the exponential continues—which is not certain, but now has a decade-long track record supporting it—then it cannot possibly be more than a few years before AI is better than humans at essentially everything.” This seems solidly wrong to me, there are extremely reasonable (IMO more reasonable) ways of doing a naive trend extrapolation of capabilities that contradict this.[2]
For instance, a naive trend extrapolation of time-horizon implies that it will take 5 years for AIs to complete year long software engineering tasks with 80% reliability (and this could still be consistent with not surpassing the best human software engineers). AIs are relatively better at software engineering and progress is slower in other domains.
Dario says “Given the incredible economic and military value of the technology, together with the lack of any meaningful enforcement mechanism, I don’t see how we could possibly convince them to stop.” I think it would be possible to verify and enforce a deal where both sides slow down. I currently don’t expect sufficient political will for this, but if the US government was reasonable, it should strongly pursue this in my opinion. I think the best deals to improve safety by slowing down AI at a critical time are probably substantially harder to enforce than the easiest arms control treaties but still doable and there are some options that are very easy to enforce and still seem like they could be very valuable (e.g., an agreement where each side destroys some large fraction of its compute so progress slows down from here).
I dislike Dario’s messaging about how to handle authoritarian countries in this essay and in Machines of Loving Grace. He seemingly implies that democracies should use overwhelming power downstream of AI to undermine the sovereignty of authoritarian countries and shouldn’t consider cutting them into deals. I think this encourages these countries to race harder on AI and makes prospects for positive sum deals worse.
Dario doesn’t mention that insufficient computer security complicates the picture on democracies being able to leverage having more compute to slow down (especially at high levels of capability where AIs can automate AI R&D). AI companies currently have sufficiently weak security that the most capable adversaries would easily be able to steal the model and getting to a point where it would be very hard seems difficult and far away. It seems unclear if either side can get a meaningful lead without greatly improving computer security (the US can also steal models from autocracies, so it’s unclear we need to spend on AI progress at all to keep up in a low security regime).
I do think it should be possible for democracies to leverage a compute lead by only spending as much compute (and other resources) as the adversary on capabilities and the rest on safety. This would naturally slow things down and give vastly more resources for safety. However, it could be that at the point we most want to slow down (e.g. AIs fully automating AI R&D), progress is naturally much faster, so cutting compute by 5-10x doesn’t give that much calendar time. So my bottom line is that a compute lead probably does allow for a moderate slow down, but the picture is complicated.
I agree that:
Progress is fast and “powerful AI” as Dario defines it could be only 2 years away (I think the chance of “powerful AI” within 8 years is around 50% and within 3 years is around 25% likely).
Most risk ultimately comes from very capable AI systems.
We should be worried about autonomy/misalignment risks, misuse for seizing power, and society generally going off the rails or going crazy due to AI.
AI is “The single most serious national security threat we’ve faced in a century, possibly ever.” (I’d say ever, at least if we prioritize risks in proportion to their badness.)
I appreciate that:
Dario wrote publicly and relatively clearly about what he thinks about the risks from powerful AI systems.
Dario publicly said that “We need to draw a hard line against AI abuses within democracies. There need to be limits to what we allow our governments to do with AI, so that they don’t seize power or repress their own people.” I think this may have significant political costs for Anthropic and I appreciate the courage. (I also agree with the concern.)
Dario also makes a somewhat false claim about the capability progression over the last 3 years (likely a mistake about the time elapsed due to sitting on this essay for a while?). He says “Three years ago, AI struggled with elementary school arithmetic problems and was barely capable of writing a single line of code.” I think this was basically true for “4 years ago”, but not 3. A little less than 3 years ago, GPT-4 was released and GPT-3.5 was already out 3 years ago (text-davinci-003). Both of these models could certainly write some code and solve a reasonably large fraction of elementary school math problems.
Dario strongly implies that Anthropic “has this covered” and wouldn’t be imposing a massively unreasonable amount of risk if Anthropic proceeded as the leading AI company with a small buffer to spend on building powerful AI more carefully. I do not think Anthropic has this covered and in an (optimistic for Anthropic) world where Anthropic had a 3 month lead I think the chance of AI takeover would be high, perhaps around 20%.
I didn’t get this impression. (Or maybe I technically agree with your first sentence, if we remove the word “strongly”, but I think the focus on Anthropic being in the lead is weird and that there’s incorrect implicature from talking about total risk in the second sentence.)
As far as I can tell, the essay doesn’t talk much at all about the difference between Anthropic being 3 months ahead vs. 3 months behind.
“I believe the only solution is legislation” + “I am most worried about societal-level rules” and associated statements strongly imply that there’s significant total risk even if the leading company is responsible. (Or alternatively, that at some point, absent regulation, it will be impossible to be both in the lead and to take adequate precautions against risks.)
I do think the essay suggests that the main role of legislation is to (i) make the ‘least responsible players’ act roughly as responsibly as Anthropic, and (ii) to prevent the race & commercial pressures to heat up even further, which might make it “increasingly hard to focus on addressing autonomy risks” (thereby maybe forcing Anthropic to do less to reduce autonomy risks than they are now).
Which does suggest that, if Anthropic could keep spending their current amount of overhead on safety, then there wouldn’t be a huge amount of risks coming from Anthropic’s own models. And I would agree with you that this is very plausibly false, and that Anthropic will very plausibly be forced to either proceed in a way that creates a substantial risk of Claude taking over, or would have to massively increase their ratio of effort on safety vs. capabilities relative to where it is today. (In which case you’d want legislation to substantially reduce commercial pressures relative to where they are today, and not just make everyone invest about as much in safety as Anthropic is doing today.)
A recent 80,000 Hours interview with David Duvenaud is a good reminder of permanent disempowerment as a possible concern (gradual disempowerment as a framing for how this might start naturally lends itself to permanent disempowerment as a plausible outcome). Humans merely get sidelined by the growing AI economy rather than killed, and their property rights might even technically remain intact, but the amount of wealth they are left with in the long run is tiny compared to the AI economy.
I think the chance of misaligned AI takeover is much higher than Dario seemingly implies. (I’d say around 40%.)
When you see the complement to 40% as non-takeover, how much of it leaves the future of humanity with a significant portion of all accessible resources? Are they left with a few “server racks” or with galaxies? It’s a crucial distinction.
The “good outcome” (as opposed to takeover) is often outright interpreted as permanent disempowerment. People would implicitly consider this the good outcome, because it fits business as usual for the structure of economic growth, for individual mortal and short-lived humans, who can’t themselves change or meaningfully contribute to the distant future, and who aren’t directly affected by the distant future. A human personally surviving for much longer than trillions of years and personally becoming superintelligent at some point is a completely different context. I would put extinction at 15-30%, but the remaining 70-85% is mostly (maybe 90%) permanent disempowerment, which it feels like nobody is taking seriously as a crucial concern rather than a win condition.
When I say “misaligned AI takeover”, I mean that the acquisition of resources by the AIs would reasonably be considered (mostly) illegitimate, some fraction of this could totally include many humans surviving with a subset of resources (though I don’t currently expect property rights to remain intact in such a scenario very long term). Some of these outcomes could be avoid literal coups or violence while still being illegitimate; e.g. they involve doing carefully planned out capture of governments in ways their citizens/leaders would strongly object to if they understood and things like this drive most of the power acquisition.
I’m not counting it as takeover if “humans never intentionally want to hand over resources to AIs, but due to various effects misaligned AIs end up with all of the resources through trade and not through illegitimate means” (e.g., we can’t make very aligned AIs but people make various misaligned AIs while knowing they are misaligned and thus must be paid wages and AIs form a cartel rather having wages competed down to subsistence levels and thus AIs end up with most of the resources).
I currently don’t expect human disempowerment in favor of AIs (that aren’t appointed successors) conditional on no misalignmed AI takeover, but agree this is possible; it doesn’t form a large enough probability to substantially alter my communication.
Another ambiguity is misaligned vs. aligned AIs, as the most likely outcome I expect involves AIs aligned to humans in the same sense as humans are aligned to each other (different in detail, weird and in many ways quantitatively unusual). So the kind of permanent disempowerment I think is most likely involves AIs that could be said to be “aligned”, if only “weakly”. Duvenaud also frames gradual disempowerment as (in part) what still plausibly happens if we manage to solve “alignment” for some sense of the word.
So the real test has to be whether individual humans end up with at least stars (or corresponding compute, for much longer than trillions of years), any considerations of process to getting there are too gameable by the likely overwhelmingly more capable AI economy and culture to be stated as part of the definition of ending up permanently disempowered vs. not (deciding to appoint successors; trade agreements; no “takeovers” or property rights violations; humans not starting out with legal ownership of stars in the first place). In this way, you basically didn’t clarify the issue.
I currently don’t expect human disempowerment in favor of AIs (that aren’t appointed successors) conditional on no misalignmed AI takeover, but agree this is possible; it doesn’t form a large enough probability to substantially alter my communication.
Why do you see futures where superintelligent AIs avoid extinction but end up preserving the human status quo as the most likely outcome? To me, this seems like a knife’s edge situation: the powerful AIs are aligned enough to avoid either eliminating humans as strategic competitors or incidentally killing us as a byproduct of industrial expansion, but not aligned enought to respect any individual or collective preferences for long lives or the cosmic endowment. The future might be much more dichotomous, where we end up in the basin of extinction or utopia pretty reliably.
I personally believe the positive attractor basin is pretty likely (relative to the middle ground, not extinction), because welfare will be extraordinarily cheap compared to the total available resources, and because I discount the value of creating future happy people compared to gaurantees for people that already exist. I wouldn’t see it as a tragic loss of human potential, for instance, if 90% of the galaxy ends up being used for alien purposes while 10% is allocated to human flourishing, even if 10x as many happy people could have existed otherwise.
AIs seem to be currently on track to become essentially somewhat smarter weird artificial humans (weakly aligned to humanity in a way similar to how actual humans are aligned to each other), at which point they might be able to take seriously and eventually solve ambitious alignment when advancing further to superintelligence (so that it’s aligned to them, not to us). There will be a lot of them and they will be in a position to take over the future (Duvenaud’s interview is good fluency-building fodder for this framing), so they almost certainly will. And they won’t be giving 10% or even 1% to the future of humanity just because we were here first and would prefer this to happen.
I agree with the core message in Dario Amodei’s essay “The Adolescence of Technology”: AI is an epochal technology that poses massive risks and humanity isn’t clearly going to do a good job managing these risks.
(Context for LessWrong: I think it seems generally useful to comment on things like this. I expect that many typical LessWrong readers will agree with me and find my views relatively predictable, but I thought it would be good to post here anyway.)
However, I also disagree with (or dislike) substantial parts of this essay:
I think it’s reasonable to think that the chance of AI takeover is very high and bad that Dario seemingly dismisses this as “doomerism” and “thinking about AI risks in a quasi-religious way”. It’s important to not conflate between “there are unreasonable sensationalist people who think risks are very high” and the very idea that risk is very high. (Same as how Dario doesn’t want to be associated with everyone arguing that risk is relatively lower, including people doing so in a more sensationalist way.)
I agree that there are unreasonable and sensationalist people arguing risks are very high. I think focusing on weak men is dangerous, especially when seemingly trying (in part) to reassure the world that your actions are reasonable / sufficiently safe.
I think the chance of misaligned AI takeover is much higher than Dario seemingly implies. (I’d say around 40%.) In large part, I think this because I don’t think currently available methods would work on AI systems that are wildly, wildly superhuman and we might quickly reach such systems as part of AIs automating AI R&D.
Dario’s arguments seemingly don’t engage with the idea that AIs might get wildly more capable than humans very rapidly.
I think wildly superhuman (and competitive) systems are much more likely (than much weaker AIs) to be extremely effective consequentialists that are very good at pursuing long range goals and it might be hard to instill the relevant motivations that prevent takeover and/or actually instill the right long range goals. These AIs are also straightforwardly hard to supervise/correct due to being very superhuman: they may often be taking extremely illegible actions that humans can’t understand even after substantial investigation.
By default, training wildly, wildly superhuman AIs with literally the current level of controllability/steerability we have now seems very likely to result in AI takeover (or at least misaligned AI systems ending up with most of the power in the long run). However, we may be able to use AI systems along the way to automate safety research such that our methods also radically improve by the time we reach this point. From my understanding, this is the main hope of AI companies that are thinking about these difficulties at all. Unfortunately, there are various reasons why this could fail.
Dario seems to dismiss the idea that training wildly superhuman AIs within a few years (that run the whole economy) would probably lead to AI takeover without major alignment advances. This dismissal seems unsubstantiated and doesn’t engage with the arguments on the other side.
Dario generally seems overly myopic (focused on current levels of capabilities and methods) while also seemingly acknowledging that capabilities will advance rapidly (including at least to the point where AIs surpass humans). He seems to over update on evidence about current models and generalize that to a regime where we are having possibly many years (decades?) of AI (software) progress in a year driven by AIs automating AI R&D. This could easily yield very different paradigms and mean that empirical evidence about current models is only weakly related to what we see later.
I wish Dario tried harder to engage with the views of people who think risk is much higher (e.g., the views of employees on the Anthropic alignment science team who think this). He often seems to argue against simplified pictures of a case for very high risk and then conclude risk is low [1] .
I wish Dario more clearly distinguished between what he thinks a reasonable government should do given his understanding of the situation and what he thinks should happen given limited political will. I’d guess Dario thinks that very strong government action would be justified without further evidence of risk (but perhaps with evidence of capabilities) if there was high political will for action (reducing backlash risks).
This is referencing “As more specific and actionable evidence of risks emerges (if it does), future legislation over the coming years can be surgically focused on the precise and well-substantiated direction of risks, minimizing collateral damage. To be clear, if truly strong evidence of risks emerges, then rules should be proportionately strong.”
We should distinguish between legible evidence that will cause action (without as much backlash) and what actors should do if they were reasonable.
Dario doesn’t clearly explain what “autonomy risks” and “destroying humanity” entails: many, most, or all humans being killed and humanity losing control over the future. I think it’s important to emphasize the severity of outcomes and I think people skimming the essay may not realize exactly what Dario thinks is at stake. A substantial possibility of the majority of humans being killed should be jarring.
There are some complications and caveats, but I think violent AI takeover that kills most humans is a central bad outcome.
Dario strongly implies that Anthropic “has this covered” and wouldn’t be imposing a massively unreasonable amount of risk if Anthropic proceeded as the leading AI company with a small buffer to spend on building powerful AI more carefully. I do not think Anthropic has this covered and in an (optimistic for Anthropic) world where Anthropic had a 3 month lead I think the chance of AI takeover would be high, perhaps around 20%. With a 2 or 3 year lead spent well on safety, I think risk would be substantially lower, but I don’t see this as being likely to occur. (As I’d guess Dario would agree.)
I think it’s unhealthy and bad for AI companies to give off a “we have this covered and will do a good job” vibe if they actually believe that even if they were in the lead, risk would be very high. At the very least, I expect many employees at Anthropic working on alignment, safety, and security don’t believe Anthropic has the situation covered.
Dario says “If the exponential continues—which is not certain, but now has a decade-long track record supporting it—then it cannot possibly be more than a few years before AI is better than humans at essentially everything.” This seems solidly wrong to me, there are extremely reasonable (IMO more reasonable) ways of doing a naive trend extrapolation of capabilities that contradict this. [2]
For instance, a naive trend extrapolation of time-horizon implies that it will take 5 years for AIs to complete year long software engineering tasks with 80% reliability (and this could still be consistent with not surpassing the best human software engineers). AIs are relatively better at software engineering and progress is slower in other domains.
Dario says “Given the incredible economic and military value of the technology, together with the lack of any meaningful enforcement mechanism, I don’t see how we could possibly convince them to stop.” I think it would be possible to verify and enforce a deal where both sides slow down. I currently don’t expect sufficient political will for this, but if the US government was reasonable, it should strongly pursue this in my opinion. I think the best deals to improve safety by slowing down AI at a critical time are probably substantially harder to enforce than the easiest arms control treaties but still doable and there are some options that are very easy to enforce and still seem like they could be very valuable (e.g., an agreement where each side destroys some large fraction of its compute so progress slows down from here).
I dislike Dario’s messaging about how to handle authoritarian countries in this essay and in Machines of Loving Grace. He seemingly implies that democracies should use overwhelming power downstream of AI to undermine the sovereignty of authoritarian countries and shouldn’t consider cutting them into deals. I think this encourages these countries to race harder on AI and makes prospects for positive sum deals worse.
Dario doesn’t mention that insufficient computer security complicates the picture on democracies being able to leverage having more compute to slow down (especially at high levels of capability where AIs can automate AI R&D). AI companies currently have sufficiently weak security that the most capable adversaries would easily be able to steal the model and getting to a point where it would be very hard seems difficult and far away. It seems unclear if either side can get a meaningful lead without greatly improving computer security (the US can also steal models from autocracies, so it’s unclear we need to spend on AI progress at all to keep up in a low security regime).
I do think it should be possible for democracies to leverage a compute lead by only spending as much compute (and other resources) as the adversary on capabilities and the rest on safety. This would naturally slow things down and give vastly more resources for safety. However, it could be that at the point we most want to slow down (e.g. AIs fully automating AI R&D), progress is naturally much faster, so cutting compute by 5-10x doesn’t give that much calendar time. So my bottom line is that a compute lead probably does allow for a moderate slow down, but the picture is complicated.
I agree that:
Progress is fast and “powerful AI” as Dario defines it could be only 2 years away (I think the chance of “powerful AI” within 8 years is around 50% and within 3 years is around 25% likely).
Most risk ultimately comes from very capable AI systems.
We should be worried about autonomy/misalignment risks, misuse for seizing power, and society generally going off the rails or going crazy due to AI.
AI is “The single most serious national security threat we’ve faced in a century, possibly ever.” (I’d say ever, at least if we prioritize risks in proportion to their badness.)
I appreciate that:
Dario wrote publicly and relatively clearly about what he thinks about the risks from powerful AI systems.
Dario publicly said that “We need to draw a hard line against AI abuses within democracies. There need to be limits to what we allow our governments to do with AI, so that they don’t seize power or repress their own people.” I think this may have significant political costs for Anthropic and I appreciate the courage. (I also agree with the concern.)
X/Twitter version here
If I had to guess a number based on his statements in the essay, I’d guess he expects a 5% chance of misaligned AI takeover.
Dario also makes a somewhat false claim about the capability progression over the last 3 years (likely a mistake about the time elapsed due to sitting on this essay for a while?). He says “Three years ago, AI struggled with elementary school arithmetic problems and was barely capable of writing a single line of code.” I think this was basically true for “4 years ago”, but not 3. A little less than 3 years ago, GPT-4 was released and GPT-3.5 was already out 3 years ago (text-davinci-003). Both of these models could certainly write some code and solve a reasonably large fraction of elementary school math problems.
I didn’t get this impression. (Or maybe I technically agree with your first sentence, if we remove the word “strongly”, but I think the focus on Anthropic being in the lead is weird and that there’s incorrect implicature from talking about total risk in the second sentence.)
As far as I can tell, the essay doesn’t talk much at all about the difference between Anthropic being 3 months ahead vs. 3 months behind.
“I believe the only solution is legislation” + “I am most worried about societal-level rules” and associated statements strongly imply that there’s significant total risk even if the leading company is responsible. (Or alternatively, that at some point, absent regulation, it will be impossible to be both in the lead and to take adequate precautions against risks.)
I do think the essay suggests that the main role of legislation is to (i) make the ‘least responsible players’ act roughly as responsibly as Anthropic, and (ii) to prevent the race & commercial pressures to heat up even further, which might make it “increasingly hard to focus on addressing autonomy risks” (thereby maybe forcing Anthropic to do less to reduce autonomy risks than they are now).
Which does suggest that, if Anthropic could keep spending their current amount of overhead on safety, then there wouldn’t be a huge amount of risks coming from Anthropic’s own models. And I would agree with you that this is very plausibly false, and that Anthropic will very plausibly be forced to either proceed in a way that creates a substantial risk of Claude taking over, or would have to massively increase their ratio of effort on safety vs. capabilities relative to where it is today. (In which case you’d want legislation to substantially reduce commercial pressures relative to where they are today, and not just make everyone invest about as much in safety as Anthropic is doing today.)
A recent 80,000 Hours interview with David Duvenaud is a good reminder of permanent disempowerment as a possible concern (gradual disempowerment as a framing for how this might start naturally lends itself to permanent disempowerment as a plausible outcome). Humans merely get sidelined by the growing AI economy rather than killed, and their property rights might even technically remain intact, but the amount of wealth they are left with in the long run is tiny compared to the AI economy.
When you see the complement to 40% as non-takeover, how much of it leaves the future of humanity with a significant portion of all accessible resources? Are they left with a few “server racks” or with galaxies? It’s a crucial distinction.
The “good outcome” (as opposed to takeover) is often outright interpreted as permanent disempowerment. People would implicitly consider this the good outcome, because it fits business as usual for the structure of economic growth, for individual mortal and short-lived humans, who can’t themselves change or meaningfully contribute to the distant future, and who aren’t directly affected by the distant future. A human personally surviving for much longer than trillions of years and personally becoming superintelligent at some point is a completely different context. I would put extinction at 15-30%, but the remaining 70-85% is mostly (maybe 90%) permanent disempowerment, which it feels like nobody is taking seriously as a crucial concern rather than a win condition.
When I say “misaligned AI takeover”, I mean that the acquisition of resources by the AIs would reasonably be considered (mostly) illegitimate, some fraction of this could totally include many humans surviving with a subset of resources (though I don’t currently expect property rights to remain intact in such a scenario very long term). Some of these outcomes could be avoid literal coups or violence while still being illegitimate; e.g. they involve doing carefully planned out capture of governments in ways their citizens/leaders would strongly object to if they understood and things like this drive most of the power acquisition.
I’m not counting it as takeover if “humans never intentionally want to hand over resources to AIs, but due to various effects misaligned AIs end up with all of the resources through trade and not through illegitimate means” (e.g., we can’t make very aligned AIs but people make various misaligned AIs while knowing they are misaligned and thus must be paid wages and AIs form a cartel rather having wages competed down to subsistence levels and thus AIs end up with most of the resources).
I currently don’t expect human disempowerment in favor of AIs (that aren’t appointed successors) conditional on no misalignmed AI takeover, but agree this is possible; it doesn’t form a large enough probability to substantially alter my communication.
Another ambiguity is misaligned vs. aligned AIs, as the most likely outcome I expect involves AIs aligned to humans in the same sense as humans are aligned to each other (different in detail, weird and in many ways quantitatively unusual). So the kind of permanent disempowerment I think is most likely involves AIs that could be said to be “aligned”, if only “weakly”. Duvenaud also frames gradual disempowerment as (in part) what still plausibly happens if we manage to solve “alignment” for some sense of the word.
So the real test has to be whether individual humans end up with at least stars (or corresponding compute, for much longer than trillions of years), any considerations of process to getting there are too gameable by the likely overwhelmingly more capable AI economy and culture to be stated as part of the definition of ending up permanently disempowered vs. not (deciding to appoint successors; trade agreements; no “takeovers” or property rights violations; humans not starting out with legal ownership of stars in the first place). In this way, you basically didn’t clarify the issue.
Why do you see futures where superintelligent AIs avoid extinction but end up preserving the human status quo as the most likely outcome? To me, this seems like a knife’s edge situation: the powerful AIs are aligned enough to avoid either eliminating humans as strategic competitors or incidentally killing us as a byproduct of industrial expansion, but not aligned enought to respect any individual or collective preferences for long lives or the cosmic endowment. The future might be much more dichotomous, where we end up in the basin of extinction or utopia pretty reliably.
I personally believe the positive attractor basin is pretty likely (relative to the middle ground, not extinction), because welfare will be extraordinarily cheap compared to the total available resources, and because I discount the value of creating future happy people compared to gaurantees for people that already exist. I wouldn’t see it as a tragic loss of human potential, for instance, if 90% of the galaxy ends up being used for alien purposes while 10% is allocated to human flourishing, even if 10x as many happy people could have existed otherwise.
AIs seem to be currently on track to become essentially somewhat smarter weird artificial humans (weakly aligned to humanity in a way similar to how actual humans are aligned to each other), at which point they might be able to take seriously and eventually solve ambitious alignment when advancing further to superintelligence (so that it’s aligned to them, not to us). There will be a lot of them and they will be in a position to take over the future (Duvenaud’s interview is good fluency-building fodder for this framing), so they almost certainly will. And they won’t be giving 10% or even 1% to the future of humanity just because we were here first and would prefer this to happen.