habryka comments on On the Rationality of Deterring ASI

habryka 22 Mar 2025 3:04 UTC
LW: 59 AF: 30
28
AF
Promoted to curated: I have various pretty substantial critiques of this work, but I do overall think this is a pretty great effort at crossing the inferential distance from people who think AGI will be a huge deal and potentially dangerous, to the US government and national security apparatus.
The thing that I feel most unhappy about is that the document feels to me like it follows a pattern that Situational Awareness also had, where it seemed to me like it kept framing various things that it wanted to happen, as “inevitable to happen”, while also arguing that they are a good idea, in a way that felt to me like it tried too hard to make some kind of self-fulfilling prophecy.
But overall, I feel like this document speaks with surprising candor and clarity about many things that have been left unsaid in many circumstances. I particularly appreciated its coverage of explicitly including conventional ballistic escalation as part of a sabotage strategy for datacenters. Relevant quotes:
Should these measures falter, some leaders may contemplate kinetic attacks on datacenters, arguing that allowing one actor to risk dominating or destroying the world are graver dangers, though kinetic attacks are likely unnecessary. Finally, under dire circumstances, states may resort to broader hostilities by climbing up existing escalation ladders or threatening non-AI assets. We refer to attacks against rival AI projects as “maiming attacks.”
I also particularly appreciated this proposed policy for how to handle AIs capable of recursive self-improvement:
In the near term, geopolitical events may prevent attempts at an intelligence recursion. Looking further ahead, if humanity chooses to attempt an intelligence recursion, it should happen in a controlled environment with extensive preparation and oversight—not under extreme competitive pressure that induces a high risk tolerance.
- davekasten 22 Mar 2025 20:41 UTC
  20 points
  7
  Parent
  I particularly appreciated its coverage of explicitly including conventional ballistic escalation as part of a sabotage strategy for datacenters
  One thing I find very confusing about existing gaps between the AI policy community and the national security community is that natsec policymakers have already explicitly said that kinetic (i.e., blowing things up) responses are acceptable for cyberattacks under some circumstances, while the AI policy community seems to somehow unconsciously rule those sorts of responses out of the policy window. (To be clear: any day that American servicemembers go into combat is a bad day, I don’t think we should choose such approaches lightly.)
  - habryka 22 Mar 2025 21:48 UTC
    9 points
    2
    Parent
    My sense is a lot of the x-risk oriented AI policy community is very focused on avoiding “gaffes” and have a very short-term and opportunistic relationship with reputation and public relations and all that kind of stuff. My sense is that people in the space don’t believe being principled or consistently honest basically ever gets rewarded or recognized, so the right strategy is to try to identify what the overton window is, only push very conservatively on expanding it, and focus on staying in the good graces of whatever process determines social standing, which is generally assumed to be pretty random and arbitrary.
    I think many people in the space, if pushed, would of course acknowledge that kinetic responses are appropriate in many AI scenarios, but they would judge it as an unnecessarily risky gaffe, and that perception of a gaffe creates a pretty effective enforcement regime for people to basically never bring it up, lest you be judged as politically unresponsible.
    - davekasten 22 Mar 2025 22:14 UTC
      14 points
      10
      Parent
      I think I am too much inside the DC policy world to understand why this is seen as a gaffe, really. Can you unpack why it’s seen as a gaffe to them? In the DC world, by contrast, “yes, of course, this is a major national security threat, and no you of course could never use military capabilities to address it,” would be a gaffe.
      - habryka 22 Mar 2025 22:20 UTC
        11 points
        2
        Parent
        I mean, you saw people make fun of it when Eliezer said it, and then my guess is people conservatively assumed that this would generalize to the future. I’ve had conversations with people where they tried to convince me that Eliezer mentioning kinetic escalation was one of the worst things that anyone has ever done for AI policy, and they kept pointing to twitter threads and conversations where opponents made fun of it as evidence. I think there clearly was something real here, but I also think people really fail to understand the communication dynamics here.
        deep 27 Mar 2025 15:51 UTC
        1 point
        −6
        Parent
        You’re missing some ways Eliezer could have predictably done better with the Time article, if he were framing it for national security folks (rather than an attempt at brutal honesty, or perhaps most acccurately a cri de coeur).
        @davekasten—Eliezer wasn’t arguing for bombing as retaliation for a cyberattack. Rather, as a preemptive measure against noncompliant AI developments:
        If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
        If you zoom out several layers of abstraction, that’s not too different from the escalation ladder concept described in this paper. A crucial difference, though, is that Eliezer doesn’t mention escalation ladders at all—or other concepts that would help neutral readers be like “OK, this guy gets how big a lift all this would be, and he has some ideas for building blocks to enact it”. Examples include “how do you get an international agreement on this stuff”, “how do you track all the chips”, “how do you prevent people building super powerful AI despite the compute threshold lowering”, “what about all the benefits of AI that we’d be passing up” (besides a brief mention that narrow-AI-for-bio might be worth it), “how confident can we be that we’d know if someone was contravening the deal”.
        Second, there was a huge inferential gap to this idea of AGI as key national security threat—there’s still a large one today, despite rhetoric around AGI. And Eliezer doesn’t do enough meeting in the middle here.
        He gives the high-level argument that to him is sufficient, but is/was not convincing to most people—that AI by some metrics is growing fast, in principle can be superhuman, etc. Unfortunately most people in government don’t have the combination of capacity, inclination, and time to assess these kinds of first-principles arguments for themselves, and they really need concreteness in the form of evidence or expert opinion.
        Also, frankly, I just think Eliezer is wrong to be as confident in his view of “doom by default” as he is, and the strategic picture looks very very different if you place say 20% or even 50% probability on this.
        If I had Eliezer’s views I’d probably focus on evals and red-teaming type research to provide fire alarms, convince technical folks p(doom) was really high, and then use that technical consensus or quasi-consensus to shift policy. This isn’t totally distinct from what Eliezer did in the past with more abstract arguments, and it kinda worked (there are a lot of people with >10% p(doom) in policy world, there was that 2023 moment where everyone was talking about it). I think in worlds where Eliezer’s right, but timelines are say more like 2030 than 2027, there’s real scope for people to be convinced of high p(doom) as AI advances, and that could motivate some real policy change.
        deep 27 Mar 2025 16:05 UTC
        1 point
        0
        Parent
        I think it’s fine that Eliezer wrote it, though. Not maximally strategic by any means, but the man’s done a lot and he’s allowed his hail mary outreach plans.
        I think at the time I and others were worried this would look bad for “safety as a whole”, but at this point concerns about AI risk are common and varied enough, and people with those concerns have often strong local reputations w/ different groups. So this is no longer as big of an issue, which I think is really healthy for AI risk folks—it means we can have Pause AI and Eliezer and Hendrycks and whoever all doing their own things, able to say “no I’m not like those folks, here’s my POV”, and not feeling like they should get a veto over each other. And in retrospect I think we should have anticipated and embraced this vision earlier on.
        tbh, this is part of what I think went wrong with EA—a shared sense that community reputation was a resource everyone benefitted from and everyone wanted to protect and polish, that people should get vetoes over what each other do and say. I think it’s healthy that there’s much less of a centralized and burnished “EA brand” these days, and much more of a bunch of people following their visions of the good. Though there’s still the problem of Open Phil as a central node in the network, through which reputation effects flow.