Daniel Kokotajlo comments on It Is Untenable That Near-Future AI Scenario Models Like “AI 2027” Don’t Include Open Source AI

Daniel Kokotajlo 16 May 2025 14:54 UTC
17 points
0
owerful guardrails in place) as miracle disease cures are arriving and robot factories are rampant and yet there is still no mention of what is happening with the open models of that day. How do we get through 2025, 2026 and 2027 with no super viruses? Or high-profile drone assassinations of political leaders?
The AI 2027 scenario predicts that no super viruses will happen in 2025-2027. This is because the open-weights AIs aren’t good enough to do it all on their own during this period and while they could provide some uplift to humans there aren’t that many human groups interested in building super viruses anyway.

A crux for me is that latter variable. If you could convince me that e.g. there are 10 human groups that would love to build super viruses but lack the technical know-how that LLMs could provide, (and that expertise was indeed the bottleneck—that there haven’t been any human groups over the last decade or two who had both the expertise and the motive) I’d become a lot more concerned.

As for drone assassinations: This has nothing to do with AI, especially with open-weights AI. The way to do a drone assassination is to fly the drone yourself like they do in Ukraine. Maybe if the security is good they have EW jammers but even then just do a fiber optic cable. Or maaaaybe you want to go for AI at that point—but you won’t be using LLMs, you’ll be using tiny models that can fit on a drone and primarily recognize images.
- Charbel-Raphaël 17 May 2025 6:53 UTC
  6 points
  0
  Parent
  biorisks is not the only risk.
  Full ARA, might not be existential, but I might be a pain in the ass once we have full adaptation and superhuman cyber/persuasion abilities.
- Andrew Dickson 17 May 2025 18:19 UTC
  3 points
  0
  Parent
  @Daniel Kokotajlo—thanks for taking the time to read this and for your thoughtful replies.
  So to make sure I understand your perspective, it sounds you believe that open models will continue to be widely available and will continue to lag about a year behind the very best frontier models for the forseeable future. But that they will simply be so underwhelming compared to the very best closed models that nothing significant on the world stage will come from it by 2030 (the year your scenario model runs to), even with (presumably) millions of developers building on open models by that point? And that you have such a high confidence in this underwhelmingness that open models are simply not worth mentioning at all. Is that all correct?
  The AI 2027 scenario predicts that no super viruses will happen in 2025-2027.
  Okay. I don’t buy this based on the model capability projections in your scenario. But even if we set aside 2025-2027 what about the years 2028 − 2030, which are by far the most exciting parts of your scenario? For example in Februrary 2028 of AI 2027, we have “Preliminary tests on Safer-3 find that it has terrifying capabilities. When asked to respond honestly with the most dangerous thing it could do, it offers plans for synthesizing and releasing a mirror life organism which would probably destroy the biosphere.”
  … which, based on a one year lag for open models, would mean that by February 2029 we have open models that are capable of offering plans for synthesizing and releasing mirror life to basically anyone in the world. (And presumably also able to allow almost anyone in the world to make a super virus with ease, since this is a much lower lift than creating mirror life).
  Even setting aside synbio risks and other blackball risks and considering only loss of control (which you seem to take much more seriously than other AI risks), things still seem problematic for your account of things. Because even in 2026 and 2027, developers at OpenBrain and DeepCenter seem seriously concerned about a loss of control of their models in those years. But if we jump ahead just a year, then a loss of control of models with those same capabilities (from the year before) will be essentially guaranteed in open models, based on developers willing to run with few or no safeguards, or even AI owners intentionally giving over autonomy to to the AI. Can you please explain how a rogue AI with 2027 frontier capabilities is incredibly scary in 2027 and not even worthy of mention in 2028, or a rogue AI with 2028 frontier capabilities might be species-ending (per your “Race” branch) in 2028 and not scary at all in 2029 and 2030?
  - Daniel Kokotajlo 18 May 2025 2:56 UTC
    6 points
    2
    Parent
    We didn’t talk about this much, but we did think about it a little bit. I’m not confident. But my take is that yeah, maybe in 2028 some minor lab somewhere releases an open-weights equivalent of the Feb 2027 model (this is not at all guaranteed btw, given what else is going on at the time, and given the obvious risks of doing so!) but at that point things are just moving very quickly. There’s an army of superintelligences being deployed aggressively into the economy and military. Any terrorist group building a bioweapon using this open-weights model would probably be discovered and shut down, as the surveillance abilities of the army of superintelligences (especially once they get access to US intelligence community infrastructure and data) would be unprecedented. And even if some terrorist group did scrape together some mirror life stuff midway through 2028… it wouldn’t even matter that much I think, because mirror life is no longer so threatening at that point. The army of superintelligences would know just what to do to stop it, and if somehow it’s impossible to stop, they would know just what to do to minimize the damage and keep people safe as the biosphere gets wrecked.
    
    Again, not confident in this. I encourage you to write a counter-scenario laying out your vision.