Raemon comments on New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

Raemon 18 Nov 2025 22:03 UTC
32 points
7
Here’s an attempt to recap the previous discussion about “Global Shutdown” vs “Plan A/Controlled Takeoff”, trying to skip ahead to the part where we’re moving the conversation forward rather than rehashing stuff.
Cruxes that seemed particularly significant (phrased the way they made most sense to me, which is hopefully reasonably ITT passing)
...
How bad is Chinese Superintelligence? For some people, it’s a serious crux whether a China-run superintelligence would be dramatically worse in outcome than a democratic country.
...
“The gameboard could change in all kinds of bad ways over 30 years.” Nations or companies could suddenly pull out in a disastrous way. If things go down in the near future there’s fewer actors to make deals with and it’s easier to plan things out.
...
Can we leverage useful work out of significantly-more-powerful-but-nonsuperhuman AIs? Especially since “the gameboard might change a lot”, it’s useful to get lots of safety research done quickly, and it’s easier to do that with more powerful AIs. So, it’s useful to continue to scale up until we’ve got the most powerful AIs can we can confidently control. (Whereas Controlled Takeoff skeptics tend to think AI that is capable of taking on the hard parts of AI safety research will already be too dangerous and untrustworthy)
...
Is there a decent chance an AI takeover is relatively nice? Giving the humans the Earth/solar system is just incredibly cheap from percentage-of-resources standpoint. This does require the AI to genuinely care about and respect our agency in a sort of complete way. But, it only has to care about us as a pretty teeny amount
[Edit: this was an interesting disagreement but I don’t know anyone for whom it’s strategically relevant, except in what arguments to publicize about whether if anyone built it, everyone would die]
...
And then, the usual “how doomed are current alignment plans?”. My impression is “Plan A” advocates are usually expecting a pretty good chance things go pretty well if humanity is making like a reasonably good faith attempt at controlled takeoff, whereas Controlled Takeoff skeptics are typically imagining “by default this just goes really poorly, you can tell because everyone seems to keep sliding off understanding or caring about the hard parts of the problem”)
...
All of those seem like reasonable things for smart, thoughtful people to disagree on. I do think some disagreement about them feels fishy/sus to me, and I have my takes on them, but, I can see where you’re coming from.
Three cruxes I still just don’t really buy as decision-relevant:
1. “We wouldn’t want to pause 30 years, and then do a takeoff very quickly – it’s probably better to do a smoother takeoff.” Yep, I agree. But, if you’re in a position to decide-on-purpose how smooth your takeoff is, you can still just do the slower one later. (Modulo “the gameboard could change in 30 years”, which makes more sense to me as a crux). I don’t see this as really arguing at all against what I imagined the Treaty to be about.
2. “We need some kind of exit plan, the MIRI Treaty doesn’t have one.” I currently don’t really buy that Plan A has more of one than the the MIRI Treaty. The MIRI treaty establishes an international governing body that makes decisions about how to change the regulations, and it’s pretty straightforward for such an org to make judgment calls once people have started producing credible safety cases. I think imagining anything more specific than this feels pretty fake to me – that’s a decision that makes more sense to punt to people who are more informed than us.
3. Shutdown is more politically intractable than Controlled Takeoff. I don’t currently buy that this is true in practice. I don’t think anyone is expecting to immediately jump to either a full-fledged version of Plan A, or a Global Shutdown. Obviously, for the near future, you try for whatever level of national and international cooperation you can get, build momentum, do the easy sells first, etc. I don’t expect, in practice, Shutdown to be different from “you did all of Plan A, and, then, took like 2-3 more steps, and by the time you’ve implemented Plan A in it’s entirety, it seems crazy to me to assume the next 2-3 steps are particularly intractable.”
  1. I totally buy “we won’t even get to a fully fledged version of Plan A”, but, that’s not an argument for Plan A over Shutdown.
  2. It feels like people are imagining “naive, poorly politically executed version of Shutdown, vs some savvily executed version of Plan A.” I think there are reasonable reasons to think the people advocating Shutdown will not be savvy. But, those reasons don’t extend to “insofar as you thought you could savvily advocate for Plan A, you shouldn’t be setting your sites on Shutdown.”
- Thomas Larsen 18 Nov 2025 23:48 UTC
  18 points
  0
  Parent
  Thanks, I thought this was a helpful comment. Putting my responses inline in case it’s helpful for people. I’ll flag that I’m a bit worried about confirmation bias / digging my heels in: would love to recognize it if I’m wrong.
  How bad is Chinese Superintelligence? For some people, it’s a serious crux whether a China-run superintelligence would be dramatically worse in outcome than a democratic country.
  This isn’t a central crux for me I think. I would say that it’s worse, but that I’m willing to make concessions here in order to make alignment more likely to go well
  “The gameboard could change in all kinds of bad ways over 30 years.” Nations or companies could suddenly pull out in a disastrous way. If things go down in the near future there’s fewer actors to make deals with and it’s easier to plan things out.
  This is the main thing for me. We’ve done a number of wargames of this sort of regime and the regime often breaks down. (though there are things that can be done to make it harder to leave the regime, which I’m strongly in favor of).
  Can we leverage useful work out of significantly-more-powerful-but-nonsuperhuman AIs? Especially since “the gameboard might change a lot”, it’s useful to get lots of safety research done quickly, and it’s easier to do that with more powerful AIs. So, it’s useful to continue to scale up until we’ve got the most powerful AIs can we can confidently control. (Whereas Controlled Takeoff skeptics tend to think AI that is capable of taking on the hard parts of AI safety research will already be too dangerous and untrustworthy)
  Yep, I think we plausibly can leverage controlled AIs to do existentially useful work. But not I’m confident, and I am not saying that control is probably sufficient. I think superhuman isn’t quite the right abstraction (as I think it’s pretty plausible we can control moderately superhuman AIs, particularly only in certain domains.), but that’s a minor point. I think Plan A attempts to be robust to the worlds where this doesn’t work by just pivoting back to human intelligence augemntation or whatever.
  Is there a decent chance an AI takeover is relatively nice? Giving the humans the Earth/solar system is just incredibly cheap from percentage-of-resources standpoint. This does require the AI to genuinely care about and respect our agency in a sort of complete way. But, it only has to care about us as a pretty teeny amount
  This is an existential catastrophe IMO and should desperately avoided, even if they do leave us a solar system or w/e.
  And then, the usual “how doomed are current alignment plans?”. My impression is “Plan A” advocates are usually expecting a pretty good chance things go pretty well if humanity is making like a reasonably good faith attempt at controlled takeoff, whereas Controlled Takeoff skeptics are typically imagining “by default this just goes really poorly, you can tell because everyone seems to keep sliding off understanding or caring about the hard parts of the problem”)
  I think the thing that matters here is the curve of “likelihood of alignment success” vs “years of lead time burned at takeoff”. We are attempting to do a survey of this among thinkers in this space who we most respect on this question, and I do think that there’s substantial win equity moving from no lead time to years or decades of lead time. Of course, I’d rather have higher assurance, but I think that you really need to believe the very strong version of “current plans are doomed” to forego Plan A. I’m very much on board with “by default this goes really poorly”.
  Three cruxes I still just don’t really buy as decision-relevant:
  “We wouldn’t want to pause 30 years, and then do a takeoff very quickly – it’s probably better to do a smoother takeoff.” Yep, I agree. But, if you’re in a position to decide-on-purpose how smooth your takeoff is, you can still just do the slower one later. (Modulo “the gameboard could change in 30 years”, which makes more sense to me as a crux). I don’t see this as really arguing at all against what I imagined the Treaty to be about.
  huh, this one seems kinda relevant to me.
  
  “We need some kind of exit plan, the MIRI Treaty doesn’t have one.” I currently don’t really buy that Plan A has more of one than the the MIRI Treaty. The MIRI treaty establishes an international governing body that makes decisions about how to change the regulations, and it’s pretty straightforward for such an org to make judgment calls once people have started producing credible safety cases. I think imagining anything more specific than this feels pretty fake to me – that’s a decision that makes more sense to punt to people who are more informed than us.
  If the international governing body starts approving AI development, then aren’t we basically just back in the plan A regime? Ofc I only think that scaling should happen once people have credible safety cases. I just think control based safety cases are sufficient. I think that we can make some speculations about what sorts of safety cases might work and which ones don’t. And I think that the fact that the MIRI treaty isn’t trying to accelerate prosaic safety / substnatially slows it down is a major point against it, which is reasonable to summarize as them not having a good exit plan.
  I’m very sypathetic to pausing until we have uploads / human intelligence augmentation, that seems good, and I’d like to do that in a good world.
  Shutdown is more politically intractable than Controlled Takeoff. I don’t currently buy that this is true in practice. I don’t think anyone is expecting to immediately jump to either a full-fledged version of Plan A, or a Global Shutdown. Obviously, for the near future, you try for whatever level of national and international cooperation you can get, build momentum, do the easy sells first, etc. I don’t expect, in practice, Shutdown to be different from “you did all of Plan A, and, then, took like 2-3 more steps, and by the time you’ve implemented Plan A in it’s entirety, it seems crazy to me to assume the next 2-3 steps are particularly intractable.”
  I totally buy “we won’t even get to a fully fledged version of Plan A”, but, that’s not an argument for Plan A over Shutdown.
  It feels like people are imagining “naive, poorly politically executed version of Shutdown, vs some savvily executed version of Plan A.” I think there are reasonable reasons to think the people advocating Shutdown will not be savvy. But, those reasons don’t extend to “insofar as you thought you could savvily advocate for Plan A, you shouldn’t be setting your sites on Shutdown.”
  This one isn’t a crux for me I think. I do probably think it’s a bit more politically intractable, but even that’s not obvious because I think shutdown would play better with the generic anti-tech audience, while Plan A (as currently written) involves automating large fractions of the economy before handoff.
  - Raemon 19 Nov 2025 0:33 UTC
    3 points
    1
    Parent
    If the international governing body starts approving AI development, then aren’t we basically just back in the plan A regime?
    I think MIRI’s plan is clearly meant to eventually build superintelligence, given that they’ve stated various times it’d be an existential catastrophe if this never happened – they just think it should happen after a lot of augmentation and carefulness.
    A lot of my point here is I just don’t really see much difference between Plan A and Shutdown except for “once you’ve established some real control over AI racing, what outcome are you shooting for nearterm?”, and I’m confused why Plan A advocates see it as substantially different.
    (Or, I think the actual differences are more about “how you expect it to play out in practice, esp. if MIRI-style folk end up being a significant political force.” Which is maybe fair, but, it’s not about the core proposal IMO.)
    “We wouldn’t want to pause 30 years, and then do a takeoff very quickly – it’s probably better to do a smoother takeoff.”
    > huh, this one seems kinda relevant to me.
    Do you understand why I don’t understand why you think that? Like, the MIRI plan is clearly aimed at eventually building superintelligence (I realize the literal treaty doesn’t emphasize that, but, it’s clear from very public writing in IABIED that it’s part of the goal), and I think it’s pretty agnostic over exactly how that shakes out.
  - aog 19 Nov 2025 3:43 UTC
    2 points
    0
    Parent
    We’ve done a number of wargames of this sort of regime and the regime often breaks down.
    
    I’d be curious to hear how it breaks down.
  - Raemon 19 Nov 2025 0:38 UTC
    2 points
    0
    Parent
    Is there a decent chance an AI takeover is relatively nice?
    > This is an existential catastrophe IMO and should desperately avoided, even if they do leave us a solar system or w/e.
    Actually, I think this maybe wasn’t cruxy for anyone. I think @ryan_greenblatt said he agreed it didn’t change the strategic picture, it just changed some background expectations.
    (I maybe don’t believe him that he doesn’t think it affects the strategic picture? It seemed like his view was fairly sensitive to various things being like 30% likely instead of like 5% or <1%, and it feels like it’s part of an overall optimistic package that adds up to being more willing to roll the dice on current proposals? But, I’d probably believe him if he reads this paragraph and is like “I have thought about whether this is a (maybe subconscious) motivation/crux and am confident it isn’t)
    - ryan_greenblatt 20 Nov 2025 18:13 UTC
      3 points
      0
      Parent
      Not a crux for me ~at all. Some upstream views that make me think “AI takeover but humans stay alive” is more likely and also make me think avoiding AI takeover is relatively easier might be a crux.
    - Lukas Finnveden 22 Nov 2025 11:20 UTC
      2 points
      0
      Parent
      I maybe don’t believe him that he doesn’t think it affects the strategic picture? It seemed like his view was fairly sensitive to various things being like 30% likely instead of like 5% or <1%, and it feels like it’s part of an overall optimistic package that adds up to being more willing to roll the dice on current proposals?
      Insofar as you’re just assessing which strategy reduces AI takeover risk the most, there’s really no way that “how bad is takeover” could be relevant. (Other than, perhaps, having implications for how much political will is going to be available.)
      “How bad is takeover?” should only be relevant when trading off “reduced risk of AI takeover” with affecting some other trade-off. (Such as risk of earth-originating intelligence going extinct, or affecting probability of US dominated vs. CCP dominated vs. international cooperation futures.) So if this was going to be a crux, I would bundle it together with your Chinese superintelligence bullet point, and ask about the relative goodness of various aligned superintelligence outcomes vs. AI takeover. (Though seems fine to just drop it since Ryan and Thomas don’t think it’s a big crux. Which I’m also sympathetic to.)