Raemon comments on Plans A, B, C, and D for misalignment risk

Raemon 9 Oct 2025 2:04 UTC
LW: 7 AF: 3
2
AF
Thanks. I’ll leave some responses but feels more fine to leave here for now.
I think shutting down all AI development is much more costly than not shutting down all AI development in a pretty straightforward sense that will in fact probably be priced into the required level of political will: Nvidia is in fact much worse off if all AI development shuts down versus if AI development proceeds, but with capabilities developing more slowly once they reach a high level of capabilities.
I would guessed the stock market will react pretty different to something like Plan A vs “shut it all down” for reasonable reasons.
I don’t understand why you think the opening steps are the most politically challenging part given that the opening steps for Plan A plausibly don’t require stopping AI development.
First, slight clarification: the thing I had in mind isn’t the opening step (which is presumably “do some ad hoc deals that build political momentum without too much cost”).
The step I have in mind is “all global compute clusters and fab production is monitored, with buy in from China, UK, Europe etc, with intent for major international escalation of some kind of some violates the monitor-pact”. This doesn’t directly shut down nVidia, but, it sure is putting some writing on the wall that I would expect nVidian political interests to fight strongly even if it doesn’t immediately come with a shut down.
I’m imagining a Plan A that doesn’t include something like that is more like a Plan A / B hybrid or some other “not the full Plan A.” (based on some other internal Plan A docs I’ve looked at that went into more detail as of a few weeks ago).
I don’t think there’s any way you get to that point without most major world leaders actually believing-in-their-heart “if anyone builds it, something real bad is dangerously likely to happen.” And by the point people are actually agreeing to have international inspection of some kind, I would expect people to more thinking “okay will this actually work?” than “what do we have buy-in for?”.
(There is a version where the US enforces it at gunpoint or at least economicsanction-point without everyone else’s buy in but I both don’t expect them to do that and don’t really expect it to work?)
MIRI discusses in the IABIED resources that they would prefer carveouts for narrow bio AI, so it’s not like they’re even advocating all progress to stop. (Advanced bio AI seems pretty good for the world and to capture a lot of the benefits).
...
I certainly do expect you-et-al to disagree with MIRI-et-al on a bunch of implementation details of the treaty.
But, it seems like a version of the treaty that doesn’t at least have the capacity to shutdown compute temporarily is a kinda fake version of Plan A, and once you have that, “Shut down” vs “Controlled Takeoff” feels more like arguing details than fundamentals to me.
- ryan_greenblatt 9 Oct 2025 2:29 UTC
  LW: 4 AF: 3
  2
  AF Parent
  I think compute + fab monitoring with potential for escalation requires much lower political will than shutting down AI development. I agree that both Plan A and shut it all down require something like this. Like I think this monitoring would plausibly not require much more political will than export controls.
  
  Advanced bio AI seems pretty good for the world and to capture a lot of the benefits
  
  Huh? No it doesn’t capture much of the benefits. I would have guessed it captures a tiny fraction of the benefits for advanced AI, even for AIs around the level where you might want to pause at human level.
  
  But, it seems like a version of the treaty that doesn’t at least have the capacity to shutdown compute temporarily is a kinda fake version of Plan A, and once you have that, “Shut down” vs “Controlled Takeoff” feels more like arguing details than fundamentals to me.
  
  I agree you will have the capacity to shut down compute temporarily either way; I disagree that there isn’t much of a difference between slowing down takeoff and shutting down all further non-narrow AI development.
  - Raemon 9 Oct 2025 3:20 UTC
    LW: 5 AF: 2
    3
    AF Parent
    I think compute + fab monitoring with potential for escalation requires much lower political will than shutting down AI development. I agree that both Plan A and shut it all down require something like this. Like I think this monitoring would plausibly not require much more political will than export controls.
    FYI this is cruxy. I don’t have very strong political-viability-intuitions, but seems like this requires export controls that several (sometimes rivalrous) major nations are agreeing to simultaneously, with at least nontrivial trust for establishing the monitoring process together, which eventually is pretty invasive.
    (maybe you are imagining the monitoring is actually mostly done with spy satellites that don’t require much trust or cooperation?)
    But like, the last draft of Plan A I saw include “we relocate all the compute to centralized locations in third party countries” as an eventual goal. That seems pretty crazy?
    - ryan_greenblatt 9 Oct 2025 3:58 UTC
      LW: 5 AF: 3
      2
      AF Parent
      
      But like, the last draft of Plan A I saw include “we relocate all the compute to centralized locations in third party countries” as an eventual goal. That seems pretty crazy?
      
      Yes, this is much harder (from a political will perspective) than compute + fab monitoring which is part of my point? Like my view is that in terms of political will requirements:
      
      compute + fab monitoring << Plan A < Shut it all down
      - Raemon 9 Oct 2025 4:22 UTC
        LW: 4 AF: 2
        0
        AF Parent
        Nod, I agree centralizing part is harder than non-centralized fab monitoring. But, I think a sufficient amount of “non-centralized” fab monitoring is still a much bigger ask than export controls, and, the centralization was part of at least one writeup of Plan A, and it seemed pretty weird to include that bit but write off “actual shutdown” as politically intractable.
        ryan_greenblatt 9 Oct 2025 16:30 UTC
        LW: 6 AF: 4
        2
        AF Parent
        I’m not trying to say “Plan A is doable and shut it all down is intractable”.
        
        My view is that “shut it all down” probably requires substantially more (but not a huge amount more) political will than Plan A such that it is maybe like 3x less likely to happen given similar amounts of effort from the safety community.
        
        You started by saying:
        
        My main question is “why do you think Shut Down actually costs more political will?”.
        
        So I was trying to respond to this. I think 3x less likely to happen is actually a pretty big deal; this isn’t some tiny difference, but neither is it “Plan A is doable and shut it all down is intractable”. (And I also think “shut it all down” has various important downsides relative to Plan A, maybe these downsides can be overcome, but by default this makes Plan A look more attractive to me even aside from the political will considerations.)
        
        I think something like Plan A or “shut it all down” are both very unlikely to happen and I’d be pretty sympathetic to describing both as politically intractable (e.g., I think something as good/strong as Plan A is only 5% likely). “politically intractable” isn’t very precise though, so I think we have to talk more quantitatively.
        
        Note that my view is also that I think pushing for Plan A isn’t the most leveraged thing for most people to do at the margin; I expect to focus on making Plans C/D go better (with some weight on things like Plan B).
        Raemon 9 Oct 2025 17:42 UTC
        LW: 4 AF: 2
        0
        AF Parent
        Nod.
        FYI, I think Shut It Down is approximately as likely to happen as “Full-fledged Plan A that is sufficiently careful enough to actually help much more than [the first several stages of Plan A that Plan A and Shut It Down share]”, on account of being simple enough that it’s even really possible to coordinate on it.
        I agree they are both pretty unlikely to happen. (Regardless, I think the thing to do is probably “reach for whatever wins seem achievable near term and try to build coordination capital for more wins”)
        I think it’s a major possible failure mode of Plan A is “it turns it a giant regulatory capture molochian boondoggle that both slows thing down for a long time in confused bad ways and reads to the public as a somewhat weirdly cynical plot, which makes people turn against tech progress comparably or more than the average Shut It Down would.” (I don’t have a strong belief about the relative likelihoods of that
        None of those beliefs are particularly strong and I could easily learn a lot that would change all my beliefs.
        Seems fine to leave it here. I dont have more arguments I didn’t already write up in “Shut It Down” is simpler than “Controlled Takeoff”, just stating for the record I don’t think you’ve put forth an argument that justifies the 3x increase in difficulty of Shut It Down over the fully fledged version of Plan A. (We might still be imagining different things re: Shut It Down)
  - Eli Tyre 10 Oct 2025 3:44 UTC
    2 points
    0
    Parent
    Huh? No it doesn’t capture much of the benefits. I would have guessed it captures a tiny fraction of the benefits for advanced AI, even for AIs around the level where you might want to pause at human level.
    Where do you think that most of the benefits come from?
    Edit: My personal consumption patterns are mostly not relevant to this question, so I moved what was formally the rest of this comment to a footnote.^[1]
    
    ^
    Perhaps I am dumb or my personal priorities are different than most people’s, but I expect a large share of the benefits from AI, to my life, personally, are going to be biotech advances, that eg could extend my life or make me smarter.
    
    Like basically the things that could make my life better are 1) somehow being introduced to a compatible romantic partner, 2) cheaper housing, 3) biotech stuff. There isn’t much else.
    
    I guess self-driving cars might make travel easier? But most of the cost of travel is housing.
    
    I care a lot about ending factory farming, but that’s biotechnology again.
    
    I guess AI, if it was trustworthy, could also substantially improve governance, which could have huge benefits to society.
    - ryan_greenblatt 10 Oct 2025 16:50 UTC
      8 points
      3
      Parent
      I don’t think you get radical reductions in mortality and radical life extension with (advanced) narrow bio AI without highly capable general AI. (It might be that a key strategy for unlocking much better biotech is highly capable general AIs creating extremely good narrow bio AI, but I don’t think the narrow bio AI which humans will create over the next ~30 years is very likely to suffice.) Like narrow bio AI isn’t going to get you (arbitrarily good) biotech nearly as fast as building generally capable AI would. This seems especially true given that much, much better biotech might require much higher GDP for e.g. running vastly more experiments and using vastly more compute. (TBC, I don’t agree with all aspects of the linked post.)
      
      I also think people care about radical increases in material abundance which you also don’t get with narrow bio AI.
      
      And the same for entertainment, antidepressants (and other drugs/modifications that might massively improve quality of life by giving people much more control over mood, experiences, etc), and becoming an upload such that you can live a radically different life if you want.
      
      You also don’t have the potential for huge improvements in animal welfare (due to making meat alternatives cheaper, allowing for engineering away suffering in livestock animals, making people wiser, etc.)
      
      I’m focusing on neartermist-style benefits; as in, immediate benefits to currently alive (or soon to be born by default) humans or animals. Of course, powerful AI could result in huge numbers of digitial minds in the short run and probably is needed for getting to a great future (with a potentially insane amount of digital minds and good utilization of the cosmic endowment etc.) longer term. The first order effects on benefits of delaying don’t matter that much from a longtermist perspective of course, so I assumed we were fully operating a neartermist-style frame when talking about benefits.
      - Eli Tyre 10 Oct 2025 17:31 UTC
        2 points
        0
        Parent
        It seems like I misunderstood your reading of Ray’s claim.
        
        I read Ray as saying “a large fraction of the benefits of advanced AI are only in the biotech sector, and so we could get a large fraction of the benefits by pushing forward on only AI for biotech.”
        
        It sounds like you’re pointing at a somewhat different axis, in response, saying “we won’t get anything close to the benefits of advanced AI agents with only narrow AI systems, because narrow AI systems are just much less helpful.”
        (And implicitly, the biotech AIs are either narrow AIs (and therefore not very helpful) or they’re general AIs that are specialized on biotech, in which case you’re not getting the the safety benefits, you’re imagining getting by only focusing biotech.)
        Raemon 10 Oct 2025 18:03 UTC
        2 points
        1
        Parent
        Ah, I had also misintepreted Ryans response here. “What actually is practical here?” makes sense as a question and I’m not sure about the answers.
        I think one of the MIRI angles here is variants of STEM AI, which might be more general, but whose training set is filtered to be only materials about bio + some related science (and avoiding as much as possible that’d point towards human psychology, geopolitics, programming, ai hardware, etc). So it both will have less propensity to take over, and be less good at it relative to it’s power level at bio.
        I wasn’t thinking about this when I wrote the previous comment, I’d have phrased it differently if I were. I agree it’s an open question whether this works. But I feel more optimistic about controlled-takeoff world that’s taking a step back from “LLMs are trained on the whole internet.”
        Also, noting: I don’t believing in a safe, full handoff to artificial AI alignment researchers (because of gradual disempowerment reasons). But, fwiw I think I’d feel pretty good about STEM AI that’s focused on various flavors of math and conceptual reasoning that somehow avoids human psychology, hardware, and geopolitics, which you don’t do a full handoff to, but, it’s able to assist pretty substantially with larger subproblems that come up.