ryan_greenblatt comments on Six Thoughts on AI Safety

ryan_greenblatt 25 Jan 2025 1:17 UTC
18 points
12
it would still be extremely hard to extinguish humanity completely.

How difficult do you expect it would be to build mirror bacteria and how lethal would this be to human civilization?

My sense is that a small subset of bio experts (e.g. 50) aimed at causing maximum damage would in principle be capable of building mirror bacteria (if not directly stopped^[1]) and this would most likely cause the deaths of the majority of humans and would have a substantial chance of killing >95% (given an intentional effort to make the deployment as lethal as possible).

I don’t have a strong view on whether this would kill literally everyone at least in the short run and given access to advanced AI (it seems sensitive to various tricky questions), but I don’t think this is crux.

Perhaps your claim is that preventing a rogue AI from doing this (or similar attacks) seems easy even if it could do it without active prevention? Or that it wouldn’t be incentivized to do this?

Similarly, causing a pandemic which is extremely costly for the global economy doesn’t appear to be that hard based on COVID, at least insofar as you think lab leak is plausible.

For instance, killing a thousand people in a single terror attack is much harder than killing the same number in many smaller attacks.

I don’t think the observational evidence you have supports this well. It could just be that terrorists seem dimishing returns in their goals to killing more people and it is sublinearly harder. (Given COVID, I’d guess that terrorists would have much more effective strategies for causing large numbers of expected fatalities, particularly if they don’t care about causing specific responses or being attributed.)

FWIW, I agree with a bottom line claim like “winning a war against a rogue AI seems potentially doable, including a rogue AI which is substantially more capable than humans”. The chance of success here of course depends on heavily on what AIs the defenders can effectively utilize (given that AIs we possess might be misaligned).

And, my guess is that in the limit of capabilities, defense isn’t substantially disadvantaged relative to offense.

(Though this might require very costly biodefense measures like having all humans live inside of a BSL-4 lab or become emulated minds, see also here. Or at a more extreme level, it might not be that hard to destroy specific physical locations such that moving quickly and unpredictably is needed to evade weaponry which invalidates living on a fixed target like earth or being unable to cope with near maximal g-forces which invalidates existing as biological human. All of these could in principle be addressed by prevention—don’t allow a rogue superintelligence to exist for long or to do much—but I thought we were talking about the case where we just try to fight a war without these level of prevention.)

I also think that even if a rogue AI were to successfully take over, it probably wouldn’t kill literally every human.
1. ↩︎
  To be clear, it might be pretty easy for US intelligence agencies to stop such an effort specifically for mirror bacteria, at least if they tried.
What links here?
- habryka's comment on Six Thoughts on AI Safety by boazbarak (25 Jan 2025 1:53 UTC; 6 points)
- habryka 25 Jan 2025 1:53 UTC
  6 points
  0
  Parent
  My sense is that a small subset of bio experts (e.g. 50) aimed at causing maximum damage would in principle be capable of building mirror bacteria (if not directly stopped^[1]) and this would most likely cause the deaths of the majority of humans and would have a substantial chance of killing >95% (given an intentional effort to make the deployment as lethal as possible).
  I currently think this is false (unless you have a bunch of people making repeated attempts after seeing it only kill a small-ish number of people). I expect mirror bacteria thing to be too hard, and to ultimately be counteractable.
  - ryan_greenblatt 25 Jan 2025 1:57 UTC
    8 points
    3
    Parent
    I picked the mirror bacteria case as I thought it was the clearest public example of a plausibly existential or at least very near existential biothreat. My guess is there are probably substantially easier but less well specified paths.
    
    mirror bacteria thing to be too hard
    
    For 50 experts given 10 years and substantial funding? Maybe the 10 years part is important—my sense is that superhuman AI could make this faster so I was thinking about giving the humans a while.
    
    to ultimately be counteractable
    
    How do you handle all/most large plants dying? Do you genetically engineer new ones with new immune systems? How do you deploy this fast enough to avoid total biosphere/agricultural collapse?
    
    I think keeping human alive with antibiotics might be doable at least for a while, but this isn’t the biggest problem you have I think.
    - habryka 25 Jan 2025 2:03 UTC
      7 points
      −2
      Parent
      I don’t currently think it’s plausible, FWIW! Agree that there are probably substantially easier and less well-specified paths.
- boazbarak 27 Jan 2025 0:46 UTC
  3 points
  −1
  Parent
  I am not a bio expert, but generally think that:
  
  1. The offense/defense ratio is not infinite. If you have the intelligence 50 bio experts trying to cause as much damage as possible, and the intelligence of 5000 bio experts trying to forsee and prepare for any such cases, I think we have a good shot.
  2. The offense/defense ratio is not really constant—if you want to destroy 99% of the population it is likely to be 10x (or maybe more—getting tails is hard) harder than destroying 90% etc..
  
  I don’t know much about mirror bacteria (and whether it is possible to have mirror antibiotics, etc..) but have not seen a reason to think that this shows the offense/defense ratio is not infinite.
  
  As I mention, in an extreme case, governments might even shut people down in their houses for weeks or months, distribute gas masks, etc.., while they work out a solution. It may have been unthinkable when Bostrom wrote his vulnerable world hypothesis paper, but it is not unthinkable now.
  - ryan_greenblatt 27 Jan 2025 2:15 UTC
    4 points
    5
    Parent
    I agree with not infinite and not being constant, but I do think the ratio for killing 90% is probably larger than 10x and plausibly much larger than 100x for some intermediate period of technological development. (Given realistic society adaptation and response.)
    
    As I mention, in an extreme case, governments might even shut people down in their houses for weeks or months, distribute gas masks, etc.., while they work out a solution. It may have been unthinkable when Bostrom wrote his vulnerable world hypothesis paper, but it is not unthinkable now.
    
    It’s worth noting that for the case of mirror bacteria in particular this exact response wouldn’t be that helpful and might be actively bad. I agree that very strong government response to clear ultra-lethal bioweapon deployment is pretty likely.
    
    I don’t know much about mirror bacteria (and whether it is possible to have mirror antibiotics, etc..) but have not seen a reason to think that this shows the offense/defense ratio is not infinite.
    
    I think it would plausibly require >5000 bio experts to prevent >30% of people dying from mirror bacteria well designed for usage as bioweapons. There are currently no clear stories for full defense from my perspective so it would require novel strategies. And the stories I’m aware of to keep a subset of humans alive seem at least tricky. Mirror antibiotics are totally possible and could be manufactured at scale, but this wouldn’t suffice for preventing most large plant life from dying which would cause problems. If we suppose that 50 experts could make mirror bacteria, then I think the offense-defense imbalance could be well over 100x?
    
    The offense/defense ratio is not really constant—if you want to destroy 99% of the population it is likely to be 10x (or maybe more—getting tails is hard) harder than destroying 90% etc..
    
    For takeover, you might only need 90% or less, depending on the exact situation, the AI’s structural advantages, and the affordances granted to a defending AI. Regardless, I don’t think “well sure, the misaligned AI will probably be defeated even if it kills 90% of us” will be much comfort to most people.
    
    While I agree that 99% is harder than 90%, I think the difference is probably more like 2x than 10x and I don’t think the effort vs log fraction destruction is going to have a constant slope. (For one thing, once a sufficiently small subset remains, a small fraction of resources suffices to outrace economically. If the AI destroys 99.95% of its adversaries and was previously controlling 0.1% of resources, this would suffice for outracing the rest of the world and becoming the dominant power, likely gaining the ability to destroy the remaining 0.05% with decent probability if it wanted to.)
    - boazbarak 29 Jan 2025 19:58 UTC
      3 points
      0
      Parent
      Since I am not a bio expert, it is very hard for me to argue about these types of hypothetical scenarios. I am even not at all sure that intelligence is the bottleneck here, whether on defense or the offense side.
      
      I agree that killing 90% of people is not very reassuring, this was more a general point why I expect the effort to damage curve to be a sigmoid rather than a straight line.