Daniel Kokotajlo comments on Vitalik’s Response to AI 2027

Daniel Kokotajlo 11 Jul 2025 21:57 UTC
12 points
0
Now, let’s remember that we are discussing the AI 2027 scenario, in which nanobots and Dyson swarms are listed as “emerging technology” by 2030. The efficiency gains that this implies are also a reason to be optimistic about the widespread deployment of the above countermeasures, despite the fact that, today in 2025, we live in a world where humans are slow and lazy and large portions of government services still run on pen and paper (without any valid security justification). If the world’s strongest AI can turn the world’s forests and fields into factories and solar farms by 2030, the world’s second-strongest AI will be able to install a bunch of sensors and lamps and filters in our buildings by 2030.
I agree that for any particular strategy OpenBrain’s misaligned ASI’s might take to do a hard-power takeover, such as bioweapons, that strategy could be foiled by careful preparation and deployment of countermeasures, and said countermeasures could be prepared and deployed quickly enough by a rival friendly/aligned ASI + tech company.

However, I’m predicting that power will have concentrated/consolidated too much by this point. E.g. in the slowdown ending the US companies merge. Also, the speed of takeoff is such that e.g. a six-month lead is probably too big of a lead; if the aligned AIs trying to defend against the misaligned ASI are six months behind, I fear that they’ll lose. (One way they could lose is by the more powerful ASI thinking of a strategy that they didn’t think of, or finding some way to undermine or sabotage their countermeasures...)

I’d feel more optimistic if e.g. there were multiple US companies that were all within 3 months of the frontier even during the intelligence explosion. However, even then, it has to be the case that at least one of those companies’ AIs are aligned/virtuous/etc. And that’s far from certain, in fact, it seems unlikely given race dynamics—I expect that the “alignment taxes” companies will need to pay to get aligned AGIs, will set them back by more than 3 months.
- Tom Davidson 12 Jul 2025 9:20 UTC
  15 points
  6
  Parent
  Hmm, given that multi year delays to rolling out broad physical infrastructure and AI takeover, a 6 month delay seems fine
  And I do think Vitalik’s view should make us much happier about a world where just one lab solves alignment but others don’t. And it’s a reason to oppose centralizing to just one AGI project (which I think you support?)
- Lukas Finnveden 12 Jul 2025 4:54 UTC
  9 points
  3
  Parent
  Importantly, if there are multiple misaligned superintelligences, and no aligned superintelligence, it seems likely that they will be motivated and capable to coordinate with each other to overthrow humanity and divide the spoils.
  - GideonF 12 Jul 2025 6:05 UTC
    6 points
    0
    Parent
    This since non-obvious to me (or at least not a slam dunk is really what I think). It may be easier for misaligned AI 1 to strike a deal with humanity that it will use humans’ resources to defeat AI 2 and 3 in exchange for say 80% of the lightcone (as opposed to the split 3 ways with the AIs).
    I’m not actually sure how well this applies in the exact situation Daniel describes (I’d need to think more) but it definitely seems plausible under a bunch of scenarios with multiple misaligned ASIs
    - Thane Ruthenis 12 Jul 2025 16:06 UTC
      15 points
      7
      Parent
      Unaugmented humanity can’t be a signatory to a non-fake deal with a superintelligence, because we can’t enforce it or verify its validity. Any such “deal” would end with the superintelligence backstabbing us once we’re no longer useful. See more here.
      A possible counter-proposal is to request, as part of the deal, that the superintelligence provides us with tools we can use to verify that it will comply with the deal/tools to bind it to the deal. That also won’t work: any tools it provides us will be poisoned in some manner, guaranteed not to actually work.
      Yes, even if we request those tools to be e. g. mathematically verifiable or something. They would just be optimized to exploit bugs in our proof-verifiers, or bugs in human minds that would cause us to predictably and systemically misunderstand what the tools actually do, etc. See more here.
    - Lukas Finnveden 12 Jul 2025 17:30 UTC
      4 points
      1
      Parent
      I agree it’s not a slam dunk.
      It does seem unlikely to me that humanity would credibly offer large fractions of all future resources. (So I wouldn’t put it in a scenario forecast meant to represent one of my top few most likely scenarios.)