Raemon comments on Plans A, B, C, and D for misalignment risk

Raemon 8 Oct 2025 21:23 UTC
LW: 49 AF: 20
11
AF
My main question is “why do you think Shut Down actually costs more political will?”.
I think Plan A and “Shut It Down” both require very similar opening steps that are the most politically challenging part AFAICT, and once the world is even remotely considering those steps, the somewhat different shut-it-down steps don’t seem particularly hard sells.
I also think Plan A “bad implementation” is much more likely, and also much worse (again see “Shut It Down” is simpler than “Controlled Takeoff”).
Gear 2: You need to compare the tractability of Global Shut Down vs Global Controlled Takeoff That Actually Works, as opposed to Something That Looks Close To But Not Actually A Controlled Takeoff.
Along with Gear 3: “Shut it down” is much simpler than “Controlled Takeoff.”
A Global Controlled Takeoff That Works has a lot of moving parts.
You need the international agreement to be capable of making any kind of sensible distinctions between safe and unsafe training runs, or even “marginally safer” vs “marginally less safe” training runs.
You need the international agreement to not turn into molochian regulatory-captured horror that perversely reverses the intent of the agreement and creates a class of bureaucrats who don’t know anything about AI and use the agreement to dole out favors.
These problems still exist in some versions of Shut It Down too, to be clear (if you’re trying to also ban algorithmic research – a lot of versions of that seem like they leave room to argue about whether agent foundations or interpretability count). But, they at least get coupled with “no large training runs, period.”
I think “guys, everyone just stop” is a way easier schelling point to coordinate around, than “everyone, we’re going to slow down and try to figure out alignment as best we can using current techniques.”
So, I am not currently convinced that Global Controlled Takeoff That Actually Works is any more politically tractable than Global Shut Down.
(Caveat: Insofar as your plan is “well, we will totally get a molochian moral maze horror, but, it’ll generally move slower and that buys time”, eh, okay, seems reasonable. But, at least be clear to yourself about what you’re aiming for)
I agree you do eventually want to go back to Plan A anyway, so I mostly am just not seeing why you really want to treat these as separate plans, rather than a single:
“Okay, we wanna get all the compute centralized and monitored, we want a lot more control over GPU production, we want to buy as much time and proceed as carefully as we can. At any given time, we want the option to either be basically shut down or in controlled-takeoff mode, depending on some conditions on the ground.”
I agree with some of the risks of “geopolitical situation might get harder to have control over” and “humanity generally becoming anti-progress” but these don’t even seem strictly worse in Shutdown World vs Controlled Takeoff world. (in particular in a “shut it down” world where the framing is “we are eventually going to build a thing that everyone agrees is good, we’re just making sure we get it right.”)
But, “guys, this is very dangerous, we are proceeding very carefully before summoning something smarter than us, while trying out best to all reap the benefits of it” seems like a way easier narrative to get everyone bought into than “guys this is dangerous enough to warrant massive GPU monitoring but… still trying to push ahead as fast as we can?”.
- ryan_greenblatt 9 Oct 2025 0:41 UTC
  LW: 19 AF: 8
  0
  AF Parent
  I think Plan A and “Shut It Down” both require very similar opening steps that are the most politically challenging part AFAICT, and once the world is even remotely considering those steps, the somewhat different shut-it-down steps don’t seem particularly hard sells.
  
  I think shutting down all AI development is much more costly than not shutting down all AI development in a pretty straightforward sense that will in fact probably be priced into the required level of political will: Nvidia is in fact much worse off if all AI development shuts down versus if AI development proceeds, but with capabilities developing more slowly once they reach a high level of capabilities.
  
  I would guessed the stock market will react pretty different to something like Plan A vs “shut it all down” for reasonable reasons.
  
  I don’t understand why you think the opening steps are the most politically challenging part given that the opening steps for Plan A plausibly don’t require stopping AI development.
  
  Another point is that many people have pretty reasonable existing objections to “shut it all down”. Here are some example objections people might have that apply more to “shut it all down” than “Plan A”:
  - Shouldn’t we at least proceed until we can’t very confidently proceed safely? Like do you really think the next generation of AIs (given some reasonable additional safeguards + evals to detect/stop unprecedently large jumps) is very dangerous?
  - This proposal seems like it would lead to turn key totalitarianism making authoritarianism more likely, especially if aggressive surpression of algorthmic research is needed. (And Plan A seems substantially less bad in this regard.) I’m not sold risk (especially Plan A vs shut it all down) is high enough to warrant this.
  - This would delay the transformative benefits of AI by a long time while not using that time in a very cost effective way.
  I think this both factors into political will and makes me more reluctant to push for “shut it all down” because I partially buy these views and because I think it’s good to be cooperative/robust-under a variety of pretty reasonable views. Like I do really feel “delaying AI for 30 years results in ~1/4 of the population dying of old age when they otherwise wouldn’t have” from a cooperativeness with other moral views perspective (I put most weight on longtermism myself).
  
  I also think Plan A “bad implementation” is much more likely, and also much worse
  
  Pausing for a long time at a low level of capability seems like it makes the risk of other actors overtaking and destablizing the pause regime especially bad. E.g., just shutting down AI development in the US is much worse than just implementing Plan A in the US, but this generally applies to any sort of partial non-proliferation/pause. More capable AIs also can help make the regime more stable.
  
  I agree that a bad implementation of Plan A can decay to something more like Plan C or worse where you don’t actually spend that much of the lead time on safety and how you feel about this depends on how you feel about something like Plan C.
  
  One way to put this is that a long pause is probably taking on a bunch more “some actor gets outside the agreement” or “the current regime collapses and you go quickly from here” risk (due to additional time and lower level of capability) with not that much benefit. E.g., like if we seemingly had the political will for a 30 year pause, I’d be pretty worried about this collapsing in 10 years and us doing something much worse than Plan A, while if we start with Plan A then we’ve already gotten a bunch done by the time the regime (potentially) collapses.
  
  I also think that if you don’t stop semiconductor progress (which again, would make political will requirements substantially higher than under Plan A), then there is a real risk of the takeoff being much faster than it would make been by default due to overhang. It’s unclear how bad this is, but I think it’s really nice to have the singularity happen at a point where compute (and ideally fab capacity) is a big bottleneck. Note that both compute overhang is less extreme than Plan A and you are slowing the takeoff itself at the most leveraged point in Plan A (such that even if you exit somewhat early, you still got most of what you could have hoped for).
  
  (I’m noticing that we’re calling one proposal “shut it all down” and the other “Plan A” (even though it’s just my favorite proposal for what to do with this level of political will) which is pretty obviously biased naming as an side effect of how I’ve introduced this proposal. I’ll keep using this naming, but readers should try to adjust for this bias as applicable.)
  
  gain see “Shut It Down” is simpler than “Controlled Takeoff”
  
  I agree “shut it all down” is a simpler proposal (in initial implementation) and this is a big advantage. If you think massively augmented humans are likely as a result of “shut it all down”, then from our perspective it’s overall simpler, not just in terms of initial implementation. Otherwise, someone still has to eventually handle the situation which is potentially complicated, especially if alignment moonshots don’t work out.
  
  I agree you do eventually want to go back to Plan A anyway, so I mostly am just not seeing why you really want to treat these as separate plans
  
  Notably, the “shut it all down” plan proposed in If Anyone Builds It, Everyone Dies involves stopping AI progress for a long period at the current level of capability, so it really is a separate plan. I agree you sometimes want to prevent development beyond a capability cap and sometimes you want to proceed, but the question from my perspective is more like “at what level of capability do you want to spend this time” and “how much time do you realistically have”.
  
  I agree with some of the risks of “geopolitical situation might get harder to have control over” and “humanity generally becoming anti-progress” but these don’t even seem strictly worse in Shutdown World vs Controlled Takeoff world.
  
  I think “humanity generally becoming anti-progress” (and stopping AI development much longer term) seems much more likely if you’re stopping all AI progress for decades (both evidentially and causally).
  
  I think the geopolitical situation in 30 years naively looks scary due to the rise of China and the relative fall of Europe and I don’t think general cultural/societal progress looks fast enough on that time frame to overcome this. I think the current CCP having control over most/all of the universe seems like 50% as bad as AI takeover in my lights, though I’m sympathetic to being more optimistic, especially about the version of the CCP that exists in 30 years.
  
  Responding to your other comment
  
  One way the geopolitical situation might get worse is “time passes, and, all kinds of stuff can change when time passes.”
  
  Another way it can get worse is “the current dynamics still involve a feeling of being rushed, and time pressure, and meanwhile the international agreements we have leave a lot more wiggle room and more confused spirit-of-the-law about how people are allowed to maneuever.” This could cause the geopolitical situation to get worse faster than it would otherwise.
  
  Which of those is worse? idk, I’m not a geopolitical expert. But, it’s why it seems pretty obviously not ‘strictly worse’ (which is a high bar, with IMO a higher burden of proof) under Shut It Down.
  
  I think China predictably getting relatively (much?) more powerful is pretty relevant. I agreee it’s not strictly worse, I think “humanity generally becoming anti-progress” is ~strictly worse under “shut it all down”.
  
  I agree it’s messy and the comparison is complicated.
  
  Also, note “shut it all down” is not like it’s actually going to be permanent.
  
  Sure, the intention isn’t that it is permanent, but I think there is a real risk of it lasting a long time until the agreement is suddenly exited in a very non-ideal way (and some small chance of this altering culture for the worse longer term and some smaller chance of this resulting in humanity never building powerful AI before it is too late).
  - Eli Tyre 10 Oct 2025 3:25 UTC
    29 points
    18
    Parent
    I think the current CCP having control over most/all of the universe seems like 50% as bad as AI takeover in my lights
    This is a wild claim to me.
    
    Can you elaborate on why you think this?
    - ryan_greenblatt 10 Oct 2025 17:15 UTC
      9 points
      3
      Parent
      (I assume you’re asking “why isn’t it much less bad than AI takeover” as opposed to “isn’t it almost as bad as AI takeover, like 98% as bad”.)
      
      I care most about the long-run utilization of cosmic resources, so this dominates my thinking about this sort of question. I think it’s very easy for humans to use cosmic resources poorly from my perspective and I think this is more likely if resources are controlled by an autocratic regime, especially an autocratic regime where one person holds most of the power (which seems reasonably likely for a post-AGI CCP). In other words, I think it’s pretty easy to lose half of the value of the long-run future (or more) based on which humans are in control and how this goes.
      
      I’ll compare the CCP having full control to broadly democratic human control (e.g. most cosmic resources are controlled by some kinda democratic system or auctioned while retaining democracy).
      
      We could break this down into likelihood of carefully reflecting and then how much this reflection converges. I think control by an autocratic regime makes reflection less likely and that selection effects around who controls the CCP are bad making post-reflection convergence worse (and it’s unclear to me how much I expect reflection to converge / be reasonable in general).
      
      Additionally, I think having a reasonably large number of people having power substantially improves the situation due to the potential for trade (e.g., many people might not care much at all about long-run resource use in far away galaxies, but this is by far the dominant source of value from my perspective!) and the beneficial effects on epistemics/culture (though this is less clear).
      
      Part of this is that I think pretty crazy considerations might be very important to having the future go close to as well as it could (e.g. acausal trade etc) and this requires some values+epistemics combinations which aren’t obviously going to happen.
      
      This analysis assumes that the AI that takes over is a pure paperclipper which doesn’t care about anything else. Taking into account the AI that takes over potentially having better values doesn’t make a big difference to the bottom line, but the fact that AIs that take over might be more likely to do stuff like acausal trade (than e.g. the CCP) makes “human autocracy” look relatively worse compared to AI takeover.
      
      See also Human takeover might be worse than AI takeover, though I disagree with the values comparisons. (I think AIs that take over are much more likely to have values that I care about very little than this post implies.)
      
      As far as outcomes for currently alive humans, I think full CCP control is maybe like 15% as bad as AI takeover relative to broadly democratic human control. AI takeover maybe kills a bit over 50% of humans in expectation while full CCP control maybe kills like 5% of humans in expectation (including outcomes for people that are as bad as death and involve modifying them greatly), but also has some chance of imposing terrible outcomes on the remaining people which are still better than death.
      
      Partial CCP control probably looks much less bad?
      
      None of this is to say we shouldn’t cooperate on AI with China and the CCP. I think cooperation on AI would be great and I also think that if the US (or an AI company) ended up being extremely powerful, that actor shouldn’t violate Chinese sovereignty.
  - habryka 9 Oct 2025 2:15 UTC
    19 points
    15
    Parent
    Shouldn’t we at least proceed until we can’t very confidently proceed safely?
    I mean, I think AI ending up uncontrollably powerful are on the order of 1-3% likely for the next generation of models. That seems far far too high. I think we are right now in a position where we can’t very confidently proceed safely.
    - ryan_greenblatt 9 Oct 2025 2:55 UTC
      4 points
      2
      Parent
      Hmm, we probably disagree about the risk depending on what you mean by “uncontrollably powerful”, especially if the AI company didn’t have any particular reason to think the jump would be especially high (as is typically the case for new models).
      
      I’d guess it’s hard for a model to be “uncontrollably powerful” (in the sense that we would be taking on a bunch of risk from a Plan A perspective) unless it is at least pretty close to being able to automate AI R&D so this requires a pretty huge capabilities jump.
      
      My guess of direct risk^[1] from the next generation of models (as in, the next major release from Anthropic+xAI+OpenAI+GDM) would be like 0.3% and I’d be like 3x lower if we were proceeding decently cautiously in a Plan A style scenario (e.g. if we had an indication the model might be much more powerful, we’d scale only a bit at a time).
      
      My estimate for 0.3%: My median is 8.5 years and maybe there are ~3-ish major model releases per year, so assuming uniform we’d get 2% chance of going all the way to AI R&D automation this generation. Then, I cut by a factor of like 10 due to this being a much larger discontinuity than we’ve seen before and by another factor of 2 from this not being guaranteed to directly result in takeover. Then, I go back up a bunch due to model uncertainty and thinking that we might be especially likely to see a big advance around now.
      
      Edit: TBC, I think it’s reasonable to describe the current state of affairs as “we can’t very confidently proceed safely” but I also think the view “we can very confidently proceed safely (e.g., takeover risk from next model generation is <0.025%) given being decently cautious” is pretty reasonable.
      
      ↩︎
      By direct risk, I mean including takeover itself and risk increases through mechanisms like self-exfil, rogue internal deployment but not including sabotaging research the AI is supposed to be doing.
  - Raemon 9 Oct 2025 2:04 UTC
    LW: 9 AF: 4
    4
    AF Parent
    Thanks. I’ll leave some responses but feels more fine to leave here for now.
    I think shutting down all AI development is much more costly than not shutting down all AI development in a pretty straightforward sense that will in fact probably be priced into the required level of political will: Nvidia is in fact much worse off if all AI development shuts down versus if AI development proceeds, but with capabilities developing more slowly once they reach a high level of capabilities.
    I would guessed the stock market will react pretty different to something like Plan A vs “shut it all down” for reasonable reasons.
    I don’t understand why you think the opening steps are the most politically challenging part given that the opening steps for Plan A plausibly don’t require stopping AI development.
    First, slight clarification: the thing I had in mind isn’t the opening step (which is presumably “do some ad hoc deals that build political momentum without too much cost”).
    The step I have in mind is “all global compute clusters and fab production is monitored, with buy in from China, UK, Europe etc, with intent for major international escalation of some kind of some violates the monitor-pact”. This doesn’t directly shut down nVidia, but, it sure is putting some writing on the wall that I would expect nVidian political interests to fight strongly even if it doesn’t immediately come with a shut down.
    I’m imagining a Plan A that doesn’t include something like that is more like a Plan A / B hybrid or some other “not the full Plan A.” (based on some other internal Plan A docs I’ve looked at that went into more detail as of a few weeks ago).
    I don’t think there’s any way you get to that point without most major world leaders actually believing-in-their-heart “if anyone builds it, something real bad is dangerously likely to happen.” And by the point people are actually agreeing to have international inspection of some kind, I would expect people to more thinking “okay will this actually work?” than “what do we have buy-in for?”.
    (There is a version where the US enforces it at gunpoint or at least economicsanction-point without everyone else’s buy in but I both don’t expect them to do that and don’t really expect it to work?)
    MIRI discusses in the IABIED resources that they would prefer carveouts for narrow bio AI, so it’s not like they’re even advocating all progress to stop. (Advanced bio AI seems pretty good for the world and to capture a lot of the benefits).
    ...
    I certainly do expect you-et-al to disagree with MIRI-et-al on a bunch of implementation details of the treaty.
    But, it seems like a version of the treaty that doesn’t at least have the capacity to shutdown compute temporarily is a kinda fake version of Plan A, and once you have that, “Shut down” vs “Controlled Takeoff” feels more like arguing details than fundamentals to me.
    - ryan_greenblatt 9 Oct 2025 2:29 UTC
      LW: 6 AF: 4
      0
      AF Parent
      I think compute + fab monitoring with potential for escalation requires much lower political will than shutting down AI development. I agree that both Plan A and shut it all down require something like this. Like I think this monitoring would plausibly not require much more political will than export controls.
      
      Advanced bio AI seems pretty good for the world and to capture a lot of the benefits
      
      Huh? No it doesn’t capture much of the benefits. I would have guessed it captures a tiny fraction of the benefits for advanced AI, even for AIs around the level where you might want to pause at human level.
      
      But, it seems like a version of the treaty that doesn’t at least have the capacity to shutdown compute temporarily is a kinda fake version of Plan A, and once you have that, “Shut down” vs “Controlled Takeoff” feels more like arguing details than fundamentals to me.
      
      I agree you will have the capacity to shut down compute temporarily either way; I disagree that there isn’t much of a difference between slowing down takeoff and shutting down all further non-narrow AI development.
      - Raemon 9 Oct 2025 3:20 UTC
        LW: 5 AF: 2
        3
        AF Parent
        I think compute + fab monitoring with potential for escalation requires much lower political will than shutting down AI development. I agree that both Plan A and shut it all down require something like this. Like I think this monitoring would plausibly not require much more political will than export controls.
        FYI this is cruxy. I don’t have very strong political-viability-intuitions, but seems like this requires export controls that several (sometimes rivalrous) major nations are agreeing to simultaneously, with at least nontrivial trust for establishing the monitoring process together, which eventually is pretty invasive.
        (maybe you are imagining the monitoring is actually mostly done with spy satellites that don’t require much trust or cooperation?)
        But like, the last draft of Plan A I saw include “we relocate all the compute to centralized locations in third party countries” as an eventual goal. That seems pretty crazy?
        ryan_greenblatt 9 Oct 2025 3:58 UTC
        LW: 5 AF: 3
        2
        AF Parent
        
        But like, the last draft of Plan A I saw include “we relocate all the compute to centralized locations in third party countries” as an eventual goal. That seems pretty crazy?
        
        Yes, this is much harder (from a political will perspective) than compute + fab monitoring which is part of my point? Like my view is that in terms of political will requirements:
        
        compute + fab monitoring << Plan A < Shut it all down
        Raemon 9 Oct 2025 4:22 UTC
        LW: 4 AF: 2
        0
        AF Parent
        Nod, I agree centralizing part is harder than non-centralized fab monitoring. But, I think a sufficient amount of “non-centralized” fab monitoring is still a much bigger ask than export controls, and, the centralization was part of at least one writeup of Plan A, and it seemed pretty weird to include that bit but write off “actual shutdown” as politically intractable.
        ryan_greenblatt 9 Oct 2025 16:30 UTC
        LW: 6 AF: 4
        2
        AF Parent
        I’m not trying to say “Plan A is doable and shut it all down is intractable”.
        
        My view is that “shut it all down” probably requires substantially more (but not a huge amount more) political will than Plan A such that it is maybe like 3x less likely to happen given similar amounts of effort from the safety community.
        
        You started by saying:
        
        My main question is “why do you think Shut Down actually costs more political will?”.
        
        So I was trying to respond to this. I think 3x less likely to happen is actually a pretty big deal; this isn’t some tiny difference, but neither is it “Plan A is doable and shut it all down is intractable”. (And I also think “shut it all down” has various important downsides relative to Plan A, maybe these downsides can be overcome, but by default this makes Plan A look more attractive to me even aside from the political will considerations.)
        
        I think something like Plan A or “shut it all down” are both very unlikely to happen and I’d be pretty sympathetic to describing both as politically intractable (e.g., I think something as good/strong as Plan A is only 5% likely). “politically intractable” isn’t very precise though, so I think we have to talk more quantitatively.
        
        Note that my view is also that I think pushing for Plan A isn’t the most leveraged thing for most people to do at the margin; I expect to focus on making Plans C/D go better (with some weight on things like Plan B).
        Raemon 9 Oct 2025 17:42 UTC
        LW: 4 AF: 2
        0
        AF Parent
        Nod.
        FYI, I think Shut It Down is approximately as likely to happen as “Full-fledged Plan A that is sufficiently careful enough to actually help much more than [the first several stages of Plan A that Plan A and Shut It Down share]”, on account of being simple enough that it’s even really possible to coordinate on it.
        I agree they are both pretty unlikely to happen. (Regardless, I think the thing to do is probably “reach for whatever wins seem achievable near term and try to build coordination capital for more wins”)
        I think it’s a major possible failure mode of Plan A is “it turns it a giant regulatory capture molochian boondoggle that both slows thing down for a long time in confused bad ways and reads to the public as a somewhat weirdly cynical plot, which makes people turn against tech progress comparably or more than the average Shut It Down would.” (I don’t have a strong belief about the relative likelihoods of that
        None of those beliefs are particularly strong and I could easily learn a lot that would change all my beliefs.
        Seems fine to leave it here. I dont have more arguments I didn’t already write up in “Shut It Down” is simpler than “Controlled Takeoff”, just stating for the record I don’t think you’ve put forth an argument that justifies the 3x increase in difficulty of Shut It Down over the fully fledged version of Plan A. (We might still be imagining different things re: Shut It Down)
      - Eli Tyre 10 Oct 2025 3:44 UTC
        2 points
        0
        Parent
        Huh? No it doesn’t capture much of the benefits. I would have guessed it captures a tiny fraction of the benefits for advanced AI, even for AIs around the level where you might want to pause at human level.
        Where do you think that most of the benefits come from?
        Edit: My personal consumption patterns are mostly not relevant to this question, so I moved what was formally the rest of this comment to a footnote.^[1]
        
        ^
        Perhaps I am dumb or my personal priorities are different than most people’s, but I expect a large share of the benefits from AI, to my life, personally, are going to be biotech advances, that eg could extend my life or make me smarter.
        
        Like basically the things that could make my life better are 1) somehow being introduced to a compatible romantic partner, 2) cheaper housing, 3) biotech stuff. There isn’t much else.
        
        I guess self-driving cars might make travel easier? But most of the cost of travel is housing.
        
        I care a lot about ending factory farming, but that’s biotechnology again.
        
        I guess AI, if it was trustworthy, could also substantially improve governance, which could have huge benefits to society.
        ryan_greenblatt 10 Oct 2025 16:50 UTC
        8 points
        3
        Parent
        I don’t think you get radical reductions in mortality and radical life extension with (advanced) narrow bio AI without highly capable general AI. (It might be that a key strategy for unlocking much better biotech is highly capable general AIs creating extremely good narrow bio AI, but I don’t think the narrow bio AI which humans will create over the next ~30 years is very likely to suffice.) Like narrow bio AI isn’t going to get you (arbitrarily good) biotech nearly as fast as building generally capable AI would. This seems especially true given that much, much better biotech might require much higher GDP for e.g. running vastly more experiments and using vastly more compute. (TBC, I don’t agree with all aspects of the linked post.)
        
        I also think people care about radical increases in material abundance which you also don’t get with narrow bio AI.
        
        And the same for entertainment, antidepressants (and other drugs/modifications that might massively improve quality of life by giving people much more control over mood, experiences, etc), and becoming an upload such that you can live a radically different life if you want.
        
        You also don’t have the potential for huge improvements in animal welfare (due to making meat alternatives cheaper, allowing for engineering away suffering in livestock animals, making people wiser, etc.)
        
        I’m focusing on neartermist-style benefits; as in, immediate benefits to currently alive (or soon to be born by default) humans or animals. Of course, powerful AI could result in huge numbers of digitial minds in the short run and probably is needed for getting to a great future (with a potentially insane amount of digital minds and good utilization of the cosmic endowment etc.) longer term. The first order effects on benefits of delaying don’t matter that much from a longtermist perspective of course, so I assumed we were fully operating a neartermist-style frame when talking about benefits.
        Eli Tyre 10 Oct 2025 17:31 UTC
        2 points
        0
        Parent
        It seems like I misunderstood your reading of Ray’s claim.
        
        I read Ray as saying “a large fraction of the benefits of advanced AI are only in the biotech sector, and so we could get a large fraction of the benefits by pushing forward on only AI for biotech.”
        
        It sounds like you’re pointing at a somewhat different axis, in response, saying “we won’t get anything close to the benefits of advanced AI agents with only narrow AI systems, because narrow AI systems are just much less helpful.”
        (And implicitly, the biotech AIs are either narrow AIs (and therefore not very helpful) or they’re general AIs that are specialized on biotech, in which case you’re not getting the the safety benefits, you’re imagining getting by only focusing biotech.)
        Raemon 10 Oct 2025 18:03 UTC
        2 points
        1
        Parent
        Ah, I had also misintepreted Ryans response here. “What actually is practical here?” makes sense as a question and I’m not sure about the answers.
        I think one of the MIRI angles here is variants of STEM AI, which might be more general, but whose training set is filtered to be only materials about bio + some related science (and avoiding as much as possible that’d point towards human psychology, geopolitics, programming, ai hardware, etc). So it both will have less propensity to take over, and be less good at it relative to it’s power level at bio.
        I wasn’t thinking about this when I wrote the previous comment, I’d have phrased it differently if I were. I agree it’s an open question whether this works. But I feel more optimistic about controlled-takeoff world that’s taking a step back from “LLMs are trained on the whole internet.”
        Also, noting: I don’t believing in a safe, full handoff to artificial AI alignment researchers (because of gradual disempowerment reasons). But, fwiw I think I’d feel pretty good about STEM AI that’s focused on various flavors of math and conceptual reasoning that somehow avoids human psychology, hardware, and geopolitics, which you don’t do a full handoff to, but, it’s able to assist pretty substantially with larger subproblems that come up.
  - Eli Tyre 10 Oct 2025 3:24 UTC
    5 points
    0
    Parent
    Thank you for writing this!
    
    Some important things that I learned / clarified for myself from this comment:
    Many plans depend on preserving the political will to maintain a geopolitical regime that isn’t the nash equilibrium, for years or decades. A key consideration for those plans is “how much much of the benefit of this plan will we have gotten, if the controlled regime breaks down early?”
    Plans that depend on having human level AIs do alignment work (if those plans work at all), don’t have linear payoff in time spent working, but they are much closer to linear than plans that depend on genetically engineered super geniuses doing the alignment work.
    In the AI alignment researcher plan, the AIs can be making progress as soon as they’re developed. In the super-genius plan, we need to develop the genetic engineering techniques and (potentially) have the super-geniuses grow up before they can get to work. The benefits to super-geniuses are backloaded, instead of linear.
    (I don’t want to overstate this difference however, because if the plan of automating alignment research is just fundamentally unworkable, it doesn’t matter that the returns to automated alignment research would be closer to linear in time, if it did work. The more important crux is “could this work at all?”)
    The complexity of “controlled takeoff” is in setting up the situation so that things are actually being done responsibly and safely, instead of only seeming so to people that aren’t equipped to judge. The complexity of “shut it all down” is in setting up an off-ramp. If “shut it all down” is also including “genetically engineer super-geniuses” as part of the plan, then it’s not clearly simpler than “controlled takeoff.”
- Raemon 8 Oct 2025 22:44 UTC
  LW: 4 AF: 3
  2
  AF Parent
  Responding to some disagree reacts:
  that are the most politically challenging part AFAICT, and once the world is even remotely considering those steps, the somewhat different shut-it-down steps don’t seem particularly hard sells.
  Seems good to register disagreement, but, fyi I have no idea why you think that.
  Re:
  “these [geopolitical situation getting worse] don’t even seem strictly worse in Shutdown World vs Controlled Takeoff world”:
  One way the geopolitical situation might get worse is “time passes, and, all kinds of stuff can change when time passes.”
  Another way it can get worse is “the current dynamics still involve a feeling of being rushed, and time pressure, and meanwhile the international agreements we have leave a lot more wiggle room and more confused spirit-of-the-law about how people are allowed to maneuever.” This could cause the geopolitical situation to get worse faster than it would otherwise.
  Which of those is worse? idk, I’m not a geopolitical expert. But, it’s why it seems pretty obviously not ‘strictly worse’ (which is a high bar, with IMO a higher burden of proof) under Shut It Down.
  (Also, note “shut it all down” is not like it’s actually going to be permanent. Any international treaty/agreement at any time can be reversed by the involved nations deciding “guys, actually we have now voted to to leave this agreement”, with some associated negotiations along the way)
  - ryan_greenblatt 8 Oct 2025 22:48 UTC
    LW: 2 AF: 2
    0
    AF Parent
    I’m going to default to bowing out, but if you want to bid for me to engage a bunch, you can.
    - Raemon 8 Oct 2025 23:29 UTC
      LW: 12 AF: 5
      2
      AF Parent
      I dunno, this seems really important and I am really confused why y’all are oriented this way.
      Yes, I very much would like responses on these and my other comment, although no worries if you want to take a bit more time to address more thoroughly.
- ryan_greenblatt 8 Oct 2025 22:19 UTC
  LW: 4 AF: 4
  0
  AF Parent
  
  But, “guys, this is very dangerous, we are proceeding very carefully before summoning something smarter than us, while trying out best to all reap the benefits of it” seems like a way easier narrative to get everyone bought into than “guys this is dangerous enough to warrant massive GPU monitoring but… still trying to push ahead as fast as we can?”.
  
  Wouldn’t the narrative for Plan A be more like “we should be cautious and slow down if we aren’t confident about safety, and we’ll need to build the ability to slow down a lot”? While the narrative for “shut it all down” would have to involve something like “proceeding with any further development is too risky given the current situation”.
  - Raemon 8 Oct 2025 22:37 UTC
    LW: 4 AF: 2
    0
    AF Parent
    I’m not 100% sure what Nate/Eliezer believe. I know they do think eventually we should build superintelligence, and that it’d be an existential catastrophe if we didn’t.
    I think they think (and, I agree) that we should be at least prepared for things that are more like 20-50 year pauses, if it turns out to take that long, but (at least speaking for myself), this isn’t because it’s intrinsically desireable to pause for 50 years. It’s because you should remain shut-down until you’re actually confidently know what you’re doing, with no pressure to convince yourself/each-other than you’re ready when you are not.
    It might be that AI-accelerated alignment researchmeans you don’t need a 20-50 year pause, but, that should be a decision the governing body makes based on how things are playing out, not baked into the initial assumption, so we don’t need to take risks like “run tons of very smart AIs in parallel very fast” when we’re only somewhat confident about their longterm alignment which opens us up to more gradual disempowerment / slowly-outmanuevered risk, or eventual death by evolution.
    I haven’t read the entirety of the IABIED website proposed treaty draft yet, but it includes this line, which includes flavor of “re-evaluate how things are going.”
    Three years after the entry into force of this Treaty, a Conference of the Parties shall be held in Geneva, Switzerland, to review the operation of this Treaty with a view to assuring that the purposes of the Preamble and the provisions of the Treaty are being realized. At intervals of three years thereafter, Parties to the Treaty will convene further conferences with the same objective of reviewing the operation of the Treaty.
    - ryan_greenblatt 9 Oct 2025 0:45 UTC
      LW: 4 AF: 4
      2
      AF Parent
      Sure, I agree that Nate/Eliezer think we should eventually build superintelligence and don’t want to causal a pause that lasts forever. In the comment you’re responding to, I’m just talking about difficulty in getting people to buy the narrative.
      
      More generally, what Nate/Eliezer think is best is doesn’t resolve concerns with the pause going poorly because something else happens in practice. This includes the pause going on too long or leading to a general anti-AI/anti-digital-minds/anti-progress view which is costly for the longer run future.) (This applies to the proposed Plan A as well, but I think poor implementation is less scary in various ways and the particular risk of ~anti-progress forever is less strong.)