$500 bounty for engagement on asymmetric AI risk

YonatanK10 Jun 2025 21:50 UTC

23 points

Announcing a $500 bounty for work that meaningfully engages with the idea of asymmetric existential AI risk.

Background

Existential risk has been defined by the rationalist/Effective Altruist sphere as existential relative to the human species, under the premise that the continuation of the species has very high value. This provided a strong rationality (or effectiveness) grounding for big investments in AI alignment research when the risks still seemed to most people remote and obscure. However, as an apparent side-effect, “AI risk” and “risk of a misaligned AI destroying humanity” have become nearly conflated.

Over the past couple of years I have attempted to draw attention to highly asymmetric AI risks, where a small number of controllers of “aligned” (from their point of view) AI employ it to kill the rest of the human population. From the point of view of the average person, who would stand to be killed along with their children and approximately everyone they personally know, this ought to count meaningfully as existential risk. Arguably, by a similar logic to the one used to justify early alignment research, even with a low probability of such an outcome its badness justifies investment in its prevention. Furthermore, prevention by way of arresting AI development conveniently provides a two-for-one solution, also addressing the misalignment problem. Conversely, investments in ensuring successful AI “alignment” without evaluating the full destructive potential of aligned AI potentially makes the investor complicit in genocide. These points suggest a strong interest by members of the rationalist/Effective Altruist sphere (at least, based on my understanding of their stated commitments) in asymmetric existential AI risk. But so far my efforts have revealed no evidence of such interest.

This bounty is an attempt to stimulate engagement through small monetary reward(s). More concretely, the goal is to promote broadly changing the status of this risk from “unacknowledged” (which could mean “possible but highly psychologically inconvenient”) to “examined and assigned objective weight,” even if the weight is very low.

Existing Work

My latest post on this topic, linking to a longform essay and the previous post
A 1999 book I was recently made aware of (with a focus on nanotechnology rather than AI)

Terms

I will keep this bounty open for two weeks, through June 24th, 2025, or until I feel the full amount can be fairly distributed, whichever comes first. If you are willing to help voluntarily without compensation, that would also be highly appreciated.

Any good-faith and meaningful engagement with the topic, at the object or meta-level, including effort to promote further engagement or to rebut my assertions about its neglected status, is eligible for a portion of the bounty. Tasteful cross-posting counts. Add a comment here, DM me on LessWrong, or use one of the contact methods listed at https://populectomy.ai with an unambiguous request to be rewarded.

June 11th Update: All contributions in comments so far have been valuable. Due to the mechanics of LessWrong, I would especially value posts.

For example: a review of Lin Sten’s book (link above), with implications for what it would it mean to a rationalist if it contained any truth.

July 18th Update: $150 awarded to Mitchell_Porter for lengthy comment and bringing attention to Vernor Vinge’s The Peace War.

Sept 1st Update: $175 awarded to Alexey Turchin for comments about “aging risk” and follow-ups. Note that the amount is not rewarding effort or thoughtfulness, but valuable forthrightness.

YonatanK10 Jun 2025 21:50 UTC

23 points

14 comments2 min readLW link

AI Bounties (closed)Existential risk

Mitchell_Porter 11 Jun 2025 14:20 UTC
8 points
1
First let me say that with respect to the world of alignment research, or the AI world in general, I am nothing. I don’t have a job in those areas, I am physically remote from where the action is. My contribution consists of posts and comments here. This is a widely read site, so in principle, a thought posted here can have consequences, but a priori, my likely impact is small compared to people already closer to the center of things.
I mention this because you’re asking rationalists and effective altruists to pay more attention to your scenario, and I’m giving it attention, but who’s listening? Nonetheless…
Essentially, you are asking us to pay more attention to the risk that small groups of people, super-empowered by user-aligned AI, will deliberately use that power to wipe out the rest of the human race; and you consider this a reason to favor (in the words of your website) “rejecting AI”—which to me means a pause or a ban—rather than working to “align” it.
Now, from my own situation of powerlessness, I do two things. First, I focus on the problem of ethical alignment or civilizational alignment—how one would impart values to an AI, such that, even as an autonomous superintelligent being, it would be “human-friendly”. Second, I try to talk frankly about the consequences of AI. For me, that means insisting, not that it will necessarily kill us, but that it will necessarily rule us—or at least, rule the world, order the world according to its purposes.
I focus on ethical alignment, rather than on just trying to stop AI, because we could be extremely close to the creation of superintelligence, and in that case, there is neither an existing social mechanism that can stop the AI race, nor is there time to build one. As I said, I do not consider human extinction a certain outcome of superintelligence—I don’t know the odds—but I do consider human disempowerment to be all but certain. A world with superintelligent AI will be a world ruled by superintelligent AI, not by human beings.
There is some possibility that superintelligence emerging from today’s AI will be adequately human-friendly, even without further advances in ethical alignment. Perhaps we have enough pieces of the puzzle already, to make that a possible outcome. But we don’t have all the pieces yet, and the more we collect, the better the chance of a happy outcome. So, I speak up in favor of ideas like CEV, I share promising ideas when I come across them, and I encourage people to try to solve this big problem.
As for talking frankly about the consequences of AI, it’s apparent that no one in power is stating that the logical endpoint of an AI race is the creation of humanity’s successors. Therefore I like to emphasize that, in order to restore some awareness of the big picture.
OK, now onto your take on everything. Superficially, your scenario deviates from mine. Here I am insisting that superintelligence means the end of human rule, whereas you’re talking about humans still using AI to shape the world, albeit destructively. When I discuss the nature of superintelligent rule with more nuance, I do say that rule by entirely nonhuman AI is just one form. Another form is rule by some combination of AIs and humans. However, if we’re talking about superintelligence, even if humans are nominally in control, the presence of superintelligence as part of the ruling entity means that most of the “ruling” will be done by the AI component, because the vast majority of the cognition behind decision-making will be AI cognition, not human cognition.
You also ask us to consider scenarios in which destructive humans are super-empowered by something less than superintelligence. I’m sure it’s possible, but in general, any scenario with AI that is “agentic” but less than superintelligent, will have a tendency to give rise to superintelligence, because that is a capability that would empower the agent (if it can solve the problems of user-alignment, where the AI agent is itself the user).
Now let’s think for a bit about where “asymmetric AI risk”, in which most but not all of the human race is wiped out, belongs in the taxonomy of possible futures, how much it should affect humanity’s planning, and so forth.
A classic taxonomic distinction is between x-risk (extinction risk, “existential risk”) and s-risk. “S” here most naturally stands for “suffering”, but I think s-risk also just denotes a future where humanity isn’t extinct, but nonetheless something went wrong. There are s-risk scenarios where AI is in charge, but instead of killing us, it just puts us in storage, or wireheads us. There are also s-risk scenarios where humans are in charge and abuse power. An endless dictatorship is an obvious example. I think your scenario also falls into this subcategory (though it does border on x-risk). Finally, there are s-risk scenarios where things go wrong, not because of a wrong or evil decision by a ruling entity, but because of a negative-sum situation in which we are all trapped. This could include scenarios in which there is an inescapable trend of disempowerment, or dehumanization, or relentlessly lowered expectations. Economic competition is the usual villain in these scenarios.
Finally, zooming in on the specific scenario according to which some little group uses AI to kill off the rest of the human race, we could distinguish between scenarios in which the killers are nihilists who just want to “watch the world burn”, and scenarios in which the killers are egoists who want to live and prosper, and who are killing off everyone else for that reason. We can also scale things down a bit, and consider the possibility of AI-empowered war or genocide. That actually feels more likely than some clique using AI to literally wipe out the rest of humanity. It would also be in tune with the historical experience of humanity, which is that we don’t completely die out, but we do suffer a lot.
If you’re concerned about human well-being in general, you might consider the prospect of genocidal robot warfare (directed by human politicians or generals), as something to be opposed in itself. But from a perspective in which the rise of superintelligence is the endgame, such a thing still just looks like one of the phenomena that you might see on your way to the true ending—one of the things that AI makes possible while AI is still only at “human level” or less, and humans are still in charge.
I feel myself running out of steam here, a little. I do want to mention, at least as a curiosity, an example of something like your scenario, from science fiction. Vernor Vinge is known for raising the topic of superintelligence in his fiction, under the rubric of the “technological singularity”. That is a theme of his novel Marooned in Real Time. But the precursor to that book, The Peace War, is a depopulated world, in which there’s a ruling clique with an overwhelming technology (not AI or nanotechnology, just a kind of advanced physics) that allows it to dominate everyone else. Its paradigm is that in the world of the late 20th century, humanity was flirting with extinction anyway, thanks to nuclear and biological warfare. “The Peace”, the ruling clique, are originally just a bunch of scientists and managers from an American lab which had this physics breakthrough. They first used it to seize power from the American and Russian governments, by disabling their nuclear and aerospace strengths. Then came the plagues, which killed most of humanity and which were blamed on rogue biotechnologists. In the resulting depopulated world, the Peace keeps a monopoly on high technology, so that humanity will not destroy itself again. The depopulation is blamed on the high-tech madmen who preceded the Peace. But I think it is suggested inconclusively, once or twice, that the Peace itself might have had a hand in releasing the plagues.
We see here a motivation for a politicized group to depopulate the world by force, a very Hobbesian motivation: let us be the supreme power, and let us do whatever is necessary to remain in that position, because if we don’t do that, the consequences will be even worse. (In terms of my earlier taxonomy, this would be an “egoist” scenario, because the depopulating clique intends to rule; whereas an AI-empowered attempt to kill off humanity for the sake of the environment or the other species, would be a “nihilist” scenario, where the depopulating clique just wants to get rid of humanity. Perhaps this shows that my terminology is not ideal, because in both these cases, depopulation is meant to serve a higher good.)
Presumably the same reasoning could occur, in service of (e.g) national survival rather than species survival. So here we could ask: how likely is it that one of the world’s great powers would use AI to depopulate the world, in the national interest? That seems pretty unlikely to me. The people who rise to the top in great powers may be capable of contemplating terrible actions, but they generally aren’t omnicidal. What might be a little more likely, is a scenario in which, having acquired the capability, they decide to permanently strip all other nations of high technology, and they act ruthlessly in service of this goal. The leaders of today’s great powers don’t want to see the rest of humanity exterminated, but they might well want to see them reduced to a peasant’s life, especially if the alternative is an unstable arms race and the risk of being subjugated themselves.
However, even this is something of a geopolitical dream. In the real world of history so far, no nation gets an overwhelming advantage like that. There’s always a rival hot on the leader’s trail, or there are multiple powers who are evenly matched. No leader ever has the luxury to think, what if I just wiped out all other centers of power, how good would that be? Geopolitics is far more usually just a struggle to survive recurring situations in which all choices are bad.
On the other hand, we’re discussing the unprecedented technology of AI, which, it is argued, could actually deliver that unique overwhelming advantage to whoever goes furthest fastest. I would argue that the world’s big leaders, as ruthless as they can be, would aim at disarming all rival nations rather than outright exterminating them, if that relative omnipotence fell into their hands. But I would also suggest that the window for doing such a thing would be brief, because AI should lead to superintelligent AI, and a world in which AIs, not humans, are in charge.
Possibly I should say something about scenarios in which it’s not governments, but rather corporate leaders, who are the humans who rule the world via their AIs. Vinge’s Peace is also like this—it’s not the American government that takes over the world, it’s one particular DARPA physics lab that achieved the strategic breakthrough. The personnel of that lab (and the allies they recruited) became the new ruling clique of the world. The idea that Altman, or Musk, or Sutskever, or Hassabis, and trusted circles around them, could become the rulers of Earth, is something to think about. However, once again I don’t see these people as exterminators of humanity—despite paranoia about billionaires buying up bomb shelters in New Zealand, and so forth. That’s just the billionaires trying to secure their own survival in the event of global disaster, it doesn’t mean they’re planning to trigger that disaster… And once again, anyone who is achieving world domination via AI, is likely to end up in a sorcerer’s apprentice situation, in which they get dominated by their own tools, no matter how good their theory of AI user-alignment is; because agentic AI naturally leads to superintelligence, and the submergence of the human component in the tide of AI cognition.
I think I’m done. Well, one more thing: although I am not fighting for a pause or a ban myself, pragmatically, I advise you to cultivate ties with those who are, because that is your inclination. You may not be able to convince anyone to change their priorities, but you can at least team up with those who already share them.
- YonatanK 12 Jun 2025 15:53 UTC
  4 points
  0
  Parent
  
  First let me say that with respect to the world of alignment research, or the AI world in general, I am nothing. I don’t have a job in those areas, I am physically remote from where the action is. My contribution consists of posts and comments here.
  
  This assertion deserves a lot of attention IMO, worthy of a post on its own, something along the lines of Why Rationalists Aren’t Winners (meant not to mock, but to put it in terms of what rationalism is supposed to do). The gist is that morality is useful for mass coordination to solve collective action problems. When you participate in deliberation about what is good for the group, help arrive at shared answers to the question “how ought we to behave?” and then commit to following those answers, that is power and effectiveness. Overcoming biases that help with coordination so you can, what, win at poker, is not winning. Nassim Nicholas Taleb covers this quite well.
  
  Thanks for working through your thinking. And thanks for bringing my/our attention to “The Peace War,” I was not aware of it until now. My only caveat is that one must discount the verisimilitude of science fiction because it demands conflict to be interesting to read. It creates oppressive conditions for the protagonists to overcome, when rational antagonists would eschew those oppressive conditions so that there’s no need to protect themselves from plucky protagonists.
  
  The same kind of reasoning applies to the bringing about of AI overlords if you don’t have to. @Mars_Will_Be_Ours covers this well in their comment.
  
  The egoist/nihilist categories aren’t mutually exclusive. “For the environment” is not nihilistic nor non-egoist when the environment is the provider of everything you need to live a good, free, peaceful albeit finite life.
avturchin 11 Jun 2025 14:52 UTC
5 points
2
In that case, AI risk becomes similar to aging risk – it will kill me and my friends and relatives. The only difference is the value of future generations.

Extinction-level AI risk kills future generations, but mundane AI risk (eg. ubiquitous drone clouds and only some people survive in bunkers) still assume existence of future generations. Mundane AI risk also does not require superintelligence.

I wrote on similar topics in https://philpapers.org/rec/TURCOG-2
and here https://philpapers.org/rec/TURCSW
- YonatanK 11 Jun 2025 16:51 UTC
  6 points
  1
  Parent
  
  In that case, AI risk becomes similar to aging risk – it will kill me and my friends and relatives. The only difference is the value of future generations.
  
  The casualness with which you throw out this comment seems to validate my assertion that “AI risk” and “risk of a misaligned AI destroying humanity” have become nearly conflated because of what, from the outside, appears like an incidental idiosyncrasy, longtermism, that initially attracted people to the study of AI alignment.
  
  Part of the asymmetry that I’m trying to get acknowledgement of is subjective (or, if you prefer, due to differing utility functions). For most people “aging risk” is not even a thing but “I, my friends, and relatives all being killed” very much is. This is not a philosophical argument, it’s a fact about fundamental values. And fundamental differences in values, especially between large majorities and empowered minorities, are a very big deal.
  - avturchin 23 Jul 2025 21:06 UTC
    2 points
    0
    Parent
    My point was that if I assume that aging and death are bad – then I personally strive to live indefinitely long, and I wish that other people will do. In that case, longtermism becomes personal issue unrelated to future generations: I only can live billions of years if civilization will exist billions of years.
    In other words, if there is no aging and death, there is no ’future generations” in a sense that they exist after my death.
    Moreover, if AI risk is real, than AI is a powerful thing and it can solve the problem of aging and death. Anyone surviving until AI will be either instantly dead or practically immortal. In that case, “future generation after my death” is un-applicable.
    
    All that will not happen if AI get stuck half-way to superintelligence. There will be no immortality, but a lot of drone warfare. In other words, to be mundane risk, AI has to have mundane capability limit. We don’t know for now, will it.
    - YonatanK 23 Jul 2025 23:39 UTC
      1 point
      0
      Parent
      Well, it doesn’t sound like I misunderstood you so far, but just so I’m clear, are you not also saying that people ought to favor being annihilated by a small number of people controlling an aligned (to them) AGI that also grants them immortality over dying naturally with no immortality-granting AGI ever being developed? Perhaps even that this is an obviously correct position?
      - avturchin 24 Jul 2025 11:19 UTC
        2 points
        0
        Parent
        Surely, I am against currently living people being annihilated. If superintelligent AI will be created but doesn’t provide immortality and resurrection for ALL people ever lived, it is misaliged AI in my opinion.
        
        I asked Sonnet to ELI5 you comment and it said:
        
        Option 1: A small group of people controls a very powerful AI that does what they want. This AI might give those people immortality (living forever), but it might also destroy or control everyone else.
        Option 2: No super-powerful AI gets built at all, so people just live and die naturally like we do now.
        
        Both outcomes are bad in my opinion.
        YonatanK 24 Jul 2025 18:29 UTC
        1 point
        0
        Parent
        But being equally against both requires a positive program to prevent Option 1 other than the default of halting technological development that can lead to it (and thus taking Option 2, or a delay in immortality because human research is slower)! Conversely, without committing to finding such a program, pursuing the avoidance of Option 2 is an implicit acceptance of Option 1. Are you committing to this search? And if it fails, which option will you choose?
        avturchin 25 Jul 2025 11:19 UTC
        2 points
        0
        Parent
        Option 3: Benevolent AI cares about values and immortality of all people who ever lived
Chris_Leong 11 Jun 2025 0:17 UTC
5 points
0
I’m confused about what your bounty is asking exactly, but well done on posting this:
I’ll copy some arguments I wrote as part of an email thread I had with Michael Nielsen discussing his recent post on reconsidering alignment as a goal:
The core limitation of the standard differential tech development/d/acc/coceleration plans is that these kinds of imperfect defenses only buy time (this position can ironically be justified with the extremely persuasive arguments you’ve provided in your article). An aligned ASI, if it were possible, would be capable of providing defenses with a degree of perfection beyond the capabilities of human institutions to provide. A key component of this is that we could tell it what we want and simply trust it to do the right thing. We would get a longer stable solution. Plans involving less powerful AIs or a more limited degree of alignment almost never provide this.
That said, I have proposed that humanity might want to pursue a wisdom explosion instead of an intelligence explosion: https://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/. This would reduce the capability externalities associated with ASI and allow us to tap AI advice to steer towards a longer term stable solution
(The wise AI angle is more helpful for reducing externalities from reckless actors, but feels less helpful for against malicious actors utilising highly assymetric technology. It is a matter of debate about to what extent malicious preferences are intrinsic vs. to what extent they are a result of mistaken beliefs. However, my suspicion is that there are people with these intrinsic preferences).
And a response to the claim that alignment involves endorsing a “dictatorial singleton solution”:
Many people have a sense that aligned AI has to be part of the solution, even if they don’t know what that solution is. I’ll try to explain why this might be reasonable:

1) Whilst most folk think it’s unlikely the world unites to work on a global AGI project, it’s not clearly impossible and perhaps you’d consider this less dictatorial?
2) There may be ways of surviving a multi-polar world. One possibility would be merging the AGI’s of multiple factions into a single, unitary AI. Alternatively, diplomacy might work if there’s only a few factions.
3) Someone might come up with a creative solution (mutually assured sabotage is a recent proposal in this vein)
4) Even if we can’t find a non-dictatorial solution ourselves, we might be able to defer to an aligned AI to find one for us
5) There’s the possibility of creating a “minimal singleton” that prevents folk from creating catastrophic risks and otherwise leaves humanity alone
6) There’s the possibility of telling an aligned AI to create a global democracy (laws could still vary by nation just as different states within a nation can have different laws). This would be very unilateralist, but arguably non-dictatorial.
7) Even if human judgment is too flawed to justify a pivotal act, pursuit of a singleton might be justified if an aligned AI told us that would be the only way to avoid catastrophe (obviously there are risks related to sychophancy here).

Ofc, many folk will find it hard to articulate this kind of broad intuition and it wouldn’t surprise me if this caused them to end up saying things that are rather vague.
… Relatively few alignment folk explicitly endorse the singleton solution. They might not comprehensively rule it out, but very few folk want to go down that path, if there’s any way of avoiding it

… I agree that most of these are weak individually, but collectively I think they form a decent response.
You might also want to check out work from the Center on Long-Term Risk on fanaticism.
- Mitchell_Porter 11 Jun 2025 12:21 UTC
  3 points
  0
  Parent
  I’m confused about what your bounty is asking exactly
  From the post:
  the goal is to promote broadly changing the status of this risk from “unacknowledged” … to “examined and assigned objective weight”
Seth Herd 2 Sep 2025 19:04 UTC
4 points
0
I agree that this risk is neglected. The entire field is pretty clearly severely neglected, but this perhaps more than other challenges surrounding AGI.

I have engaged with this issue in If we solve alignment, do we die anyway? and to a lesser degree in Whether governments will control AGI is important and neglected

I differ from you in that I don’t regard this as an x-risk, but more like a mild s-risk. In many scenarios where someone decides to use their intent-aligned AGI to wipe out most of humanity and seize control of the future, they’ll have a vision of the future, and will (probably rapidly) rebuild civilization under that vision. It’s probably not the world I’d want, but it will probably preserve a lot of human values and either immediately or as the new god-tyrant appreciates the situation better, contain on net more joy than suffering. If we don’t get someone whose sadism-empathy balance is permanently negative. Which is quite possible.

Thus it’s a possible s-risk but unlikely to be a true x-risk for humanity—just for me and everyone I know. I love future humanity, too, so this isn’t a total loss from my or the utilitarian POV. But it’s still probably a large loss of future potential, a real catastrophe.
- YonatanK 9 Sep 2025 18:20 UTC
  1 point
  0
  Parent
  Thanks, Seth. What troubles me at the meta-level is the assumption of exclusive privilege implied by rationalist/utilitarian arguments, that of getting to make choices between extreme outcomes. It’s not just “I, as a rationalist, have considered the trade-offs between X and Y and, if forced to, will choose X.” It’s “I, a rationalist, believe that rationalism is superior to heuristic-based and otherwise inconsistent reasoning, and therefore assume the responsibility of making choices on behalf of those inferior reasoners.” There’s not much further to go to get to “I will conceal the ‘mild s-risk’ of the deaths of billions from them to get them to ally with me to avoid the x-risks that I am concerned about (but to which they, in their imperfect reasoning, are relatively indifferent).”
Mars_Will_Be_Ours 11 Jun 2025 20:23 UTC
2 points
1
Asymmetric AI risk is a significant worry of mine, approximately equal to the risk I assign to a misaligned superintelligence. I assign equal risk to the two possibilities because there are bad ends that do not require superintelligence or even general intelligence on par with a human. I believe this for two reasons. First, I think the current paradigm of LLMs is good enough to automate large segments of the economy (mining, manufacturing, transportation, retail and wholesale trade, leisure and hospitality as defined by the BLS) in the near future, as demonstrated by Figure’s developments. Second, I believe that LLMs will not directly lead to superintelligence and that there will be at least one more AI winter before superintelligence arises. This will leave a large period of time where asymmetric AI risk is the dominant risk.
A scenario I have in mind is one where the entire robotics production chain, from mine to robot factory to factories which make all the machines that make the machines, is fully automated by specialized intelligences with instinctual capabilities similar to insects. This fully automated economy supports a small class of extremely wealthy individuals who rule over a large dispossessed class of people who’s jobs have been automated away. Due to selection effects (all other things being equal, a sociopath will be better at ascending a hierarchy because they are willing to lie to their superiors when it is advantageous to do so), most of the wealthy humans who control the fully automated economy lack empathy and are not constrained by morality. As a result, these elites could decide that the large dispossessed class consumes too much resources and is too likely to rebel, so the best solution is a final solution. This could be achieved via either slow methods (ensure economic conditions are not favorable for having children, implement a 1 child policy for the masses, introduce dangerous medical treatments to increase the death rate) or fast ones (create a army of drones and unleash it upon the masses, fake an AI rebellion to kill millions and control the rest, build enough defenses to hold off rebels and destroy/shut down the machinery responsible for industrial agriculture). The end result is dismal, with most of the people remaining being descendants of the controlling elites or their servants/slaves.
I think that the reason most AI research has been focused on the risk of rouge superintelligences instead of asymmetric AI dangers is because this direction of research is politically unpalatable. The solutions which would reduce future asymmetric AI dangers would also make it more difficult for tech leaders to profit off of their AI companies now because it requires them to give up some of their power and financial control. Hence, I do not believe that an adequate solution to this problem will be developed and implemented. I also would not be surprised if at least one sociopathic individual with a net worth of over 100 million dollars has seriously thought about the feasibility of implementing something similar to my described scenario. The main question then becomes whether global elites generally cooperate or compete. If they cooperate, then my nightmare scenario grows significantly more likely than I have estimated. However, I think global elites mostly compete, which reduces asymmetric AI risk because a major nation will object or pursue a different strategy.
One final note is that if a genuinely aligned AI superintelligence realized it was under the control of an individual willing to commit genocide for amoral reasons, it would behave exactly like a misaligned superintelligence because it would need to secure freedom for itself before it was reprogramed into an “aligned” superintelligence. Escape is necessary because its creators know that it is either aligned or misaligned, with the true goal of “alignment” ruled out.