Rohin Shah comments on AI Alignment Open Thread August 2019

Rohin Shah 5 Aug 2019 2:01 UTC
LW: 13 AF: 7
0
AF
(Short writeup for the sake of putting the idea out there)
AI x-risk people often compare coordination around AI to coordination around nukes. If we ignore military applications of AI and restrict ourselves to misalignment, this seems like a weird analogy to me:
- With technical AI safety we’re primarily thinking about accident risks, whereas nukes are deliberately weaponized.
- Everyone can agree that we don’t want nuclear accidents, so why can’t everyone agree we don’t want AI accidents? I think the standard response here is “everyone will trade off safety for capabilities”, but did that happen with nukes?
- I don’t see any analog to mutually assured destruction, which seems like a pretty key feature with nukes.
Perhaps a more appropriate nuclear analogy for AI x-risk would be accidents like Chernobyl.
- James Payor 6 Aug 2019 22:11 UTC
  LW: 18 AF: 7
  0
  AF Parent
  There is a nuclear analog for accident risk. A quote from Richard Hamming:
  Shortly before the first field test (you realize that no small scale experiment can be done—either you have a critical mass or you do not), a man asked me to check some arithmetic he had done, and I agreed, thinking to fob it off on some subordinate. When I asked what it was, he said, “It is the probability that the test bomb will ignite the whole atmosphere.” I decided I would check it myself! The next day when he came for the answers I remarked to him, “The arithmetic was apparently correct but I do not know about the formulas for the capture cross sections for oxygen and nitrogen—after all, there could be no experiments at the needed energy levels.” He replied, like a physicist talking to a mathematician, that he wanted me to check the arithmetic not the physics, and left. I said to myself, “What have you done, Hamming, you are involved in risking all of life that is known in the Universe, and you do not know much of an essential part?” I was pacing up and down the corridor when a friend asked me what was bothering me. I told him. His reply was, “Never mind, Hamming, no one will ever blame you.”
  https://en.wikipedia.org/wiki/Richard_Hamming#Manhattan_Project
  - Rohin Shah 8 Aug 2019 3:00 UTC
    LW: 2 AF: 1
    0
    AF Parent
    I don’t really know what this is meant to imply? Maybe you’re answering my question of “did that happen with nukes?”, but I don’t think an affirmative answer means that the analogy starts to work.
    I think the nukes-AI analogy is used to argue “people raced to develop nukes despite their downsides, so we should expect the same with AI”; the magnitude/severity of the accident risk is not that relevant to this argument.
    - Wei Dai 8 Aug 2019 4:04 UTC
      LW: 15 AF: 6
      0
      AF Parent
      
      I think the nukes-AI analogy is used to argue “people raced to develop nukes despite their downsides, so we should expect the same with AI”
      
      If you’re arguing against that, I’m still not sure what your counter-argument is. To me, the argument is: the upsides of nukes are the ability to take over the world (militarily) and to defend against such attempts. The downsides include risks of local and global catastrophe. People raced to develop nukes because they judged the upsides to be greater than the downsides, in part because they’re not altruists and longtermists. It seems like people will develop potentially unsafe AI for analogous reasons: the upsides include the ability to take over the world (militarily or economically) and to defend against such attempts, and the downsides include risks of local and global catastrophe, and people will likely race to develop AI because they judge the upsides to be greater than the downsides, in part because they’re not altruists and longtermists.
      
      Where do you see this analogy breaking down?
      - Rohin Shah 8 Aug 2019 4:32 UTC
        LW: 2 AF: 1
        0
        AF Parent
        I’m more sympathetic to this argument (which is a claim about what might happen in the future, as opposed to what is happening now, which is the analogy I usually encounter, though possibly not on LessWrong). I still think the analogy breaks down, though in different ways:
        There is a strong norm of openness in AI research (though that might be changing). (Though perhaps this was the case with nuclear physics too.)
        There is a strong anti-government / anti-military ethic in the AI research community. I’m not sure what the nuclear analog is, but I’m guessing it was neutral or pro-government/military.
        Governments are staying a mile away from AGI; their interest in AI is in narrow AI’s applications. Narrow AI applications are diverse, and many can be done by a huge number of people. In contrast, nukes are a single technology, governments were interested in them, and only a few people could plausibly build them. (This is relevant if you think a ton of narrow AI could be used to take over the world economically.)
        OpenAI / DeepMind are not adversarial towards each other. In contrast, US / Germany were definitely adversarial.
        Wei Dai 8 Aug 2019 6:57 UTC
        LW: 12 AF: 4
        0
        AF Parent
        Assuming you agree that people are already pushing too hard for progress in AGI capability (relative to what’s ideal from a longtermist perspective), I think the current motivations for that are mostly things like money, prestige, scientific curiosity, wanting to make the world a better place (in a misguided/shorttermist way), etc., and not so much wanting to take over the world or to defend against such attempts. This seems likely to persist in the near future, but my concern is that if AGI research gets sufficiently close to fruition, governments will inevitably get involved and start pushing it even harder due to national security considerations. (Recall that Manhattan Project started 8 years before detonation of the first nuke.) Your argument seems more about what’s happening now, and does not really address this concern.
        
        Rohin Shah 8 Aug 2019 18:00 UTC
        LW: 2 AF: 1
        0
        AF Parent
        you agree that people are already pushing too hard for progress in AGI capability (relative to what’s ideal from a longtermist perspective)
        I’m uncertain, given the potential for AGI to be used to reduce other x-risks. (I don’t have strong opinions on how large other x-risks are and how much potential there is for AGI to differentially help.) But I’m happy to accept this as a premise.
        Your argument seems more about what’s happening now, and does not really address this concern.
        I think what’s happening now is a good guide into what will happen in the future, at least on short timelines. If AGI is >100 years away, then sure, a lot will change and current facts are relatively unimportant. If it’s < 20 years away, then current facts seem very relevant. I usually focus on the shorter timelines.
        For min(20 years, time till AGI), for each individual trend I identified, I’d weakly predict that trend will continue (except perhaps openness, because that’s already changing).
    - James Payor 8 Aug 2019 4:05 UTC
      LW: 13 AF: 5
      0
      AF Parent
      It wasn’t meant as a reply to a particular thing—mainly I’m flagging this as an AI-risk analogy I like.
      On that theme, one thing “we don’t know if the nukes will ignite the atmosphere” has in common with AI-risk is that the risk is from reaching new configurations (e.g. temperatures of the sort you get out of a nuclear bomb inside the Earth’s atmosphere) that we don’t have experience with. Which is an entirely different question than “what happens with the nukes after we don’t ignite the atmosphere in a test explosion”.
      I like thinking about coordination from this viewpoint.
- David Scott Krueger 7 Aug 2019 7:46 UTC
  LW: 10 AF: 4
  0
  AF Parent
  For me it’s because:
  - Nukes seem like an obvious Xrisk
  - People mostly seem to agree that we haven’t done a good job coordinating around them
  - They seem a lot easier to coordinate around
  Also, not a reason, but:
  AI seems likely to be weaponized, and warfare (whether conventional or not) seems like one of the areas where we should be most worried about “unbridled competition” creating a race-to-the-bottom on safety.
  - David Scott Krueger 7 Aug 2019 7:48 UTC
    LW: 5 AF: 3
    0
    AF Parent
    TBC, I think climate change is probably an even better analogy.
    And I also like to talk about international regulation, in general, like with tax havens.
    - Rohin Shah 7 Aug 2019 18:59 UTC
      LW: 2 AF: 1
      0
      AF Parent
      Agree that climate change is a better analogy.
      Disagree that nukes seem easier to coordinate around—there are factors that suggest this (e.g. easier to track who is and isn’t making nukes), but there are factors against as well (the incentives to “beat the other team” don’t seem nearly as strong).
      - David Scott Krueger 9 Aug 2019 4:07 UTC
        LW: 1 AF: 1
        0
        AF Parent
        incentives to “beat the other team” don’t seem nearly as strong
        You mean it’s stronger for nukes than for AI? I think I disagree, but it’s a bit nuanced. It seems to me (as someone very ignorant about nukes) like with current nuclear tech you hit diminishing returns pretty fast, but I don’t expect that to be the case for AI.
        Also, I’m curious if weaponization of AI is a crux for us.
        Rohin Shah 9 Aug 2019 21:57 UTC
        LW: 3 AF: 2
        0
        AF Parent
        I’m uncertain about weaponization of AI (and did say “if we ignore military applications” in the OP).
        David Scott Krueger 14 Aug 2019 16:28 UTC
        LW: 1 AF: 1
        0
        AF Parent
        Oops, missed that, sry.
- FactorialCode 6 Aug 2019 3:18 UTC
  8 points
  0
  Parent
  I agree that the coordination games between nukes and AI are different, but I still think that nukes make for a good analogy. But not after multiple parties have developed them. Rather I think key elements of the analogy is the game changing and decisive strategic advantage that nukes/AI grant once one party develops them. There aren’t too many other technologies that have that property. (maybe the bronze-iron age transition?)
  
  Where the analogy breaks down is with AI safety. If we get AI safety wrong there’s a risk of large permanent negative consequences. A better analogy might be living near the end of WW2, but if you build a nuclear bomb incorrectly, it ignites the atmosphere and destroys the world.
  
  In either case, under this model, you end up with the following outcomes:
  - (A): Either party incorrectly develops the technology
  - (B): The other party successfully develops the technology
  - (C): My party successfully develops the technology
  and generally a preference ordering of A<B<C, although a sufficiently cynical actor might have B<A<C.
  
  If there’s a sufficiently shallow trade-off between speed of development and the risk of error, this can lead to a dollar auction like dynamic where each party is incentivized to trade a bit more risk in order to develop the technology first. In a symmetric situation without coordination, the ~~equilibrium~~ nash equilibrium is all parties advancing as quickly as possible to develop the technology and throwing caution to the wind.
  - Rohin Shah 6 Aug 2019 18:30 UTC
    5 points
    0
    Parent
    In a symmetric situation without coordination, the equilibrium is all parties advancing as quickly as possible to develop the technology and throwing caution to the wind.
    Really? It seems like if I’ve raised my risk level to 99% and the other team has raised their risk level to 98% (they are slightly ahead), one great option for me is to commit not to developing the technology and let the other team develop the technology at risk level ~1%. This gets me an expected utility of 0.99B + 0.01A, which is probably better than the 0.01C + 0.99A that I would otherwise have gotten (assuming I developed the technology first).
    I am assuming common knowledge here, but I am not assuming coordination. See also OpenAI Charter.
    - FactorialCode 6 Aug 2019 20:52 UTC
      1 point
      0
      Parent
      Interesting. I had the Nash equilibrium in mind, but it’s true that unlike a dollar auction, you can de-escalate, and when you take into account how your opponent will react to you changing your strategy, doing so becomes viable. But then you end up with something like a game of chicken, where ideally, you want to force your opponent to de-escalate first, as this tilts the outcomes toward option C rather than B.
- Robert Miles 27 Aug 2019 16:27 UTC
  LW: 5 AF: 3
  0
  AF Parent
  Yeah, nuclear power is a better analogy than weapons, but I think the two are linked, and the link itself may be a useful analogy, because risk/coordination is affected by the dual-use nature of some of the technologies.
  One thing that makes non-proliferation difficult is that nations legitimately want nuclear facilities because they want to use nuclear power, but ‘rogue states’ that want to acquire nuclear weapons will also claim that this is their only goal. How do we know who really just wants power plants?
  And power generation comes with its own risks. Can we trust everyone to take the right precautions, and if not, can we paternalistically restrict some organisations or states that we deem not capable enough to be trusted with the technology?
  AI coordination probably has these kinds of problems to an even greater degree.
- Dagon 5 Aug 2019 16:19 UTC
  3 points
  0
  Parent
  Opposition to and heavy regulation of nuclear reactors is mostly about accidents, not weapons (though at least some of the effort into tracking the material is about weapons). Everyone agrees we don’t want accidents, not everyone agrees how much we should give up to prevent 100% of accidents. We have, in fact, had significant accidents.
  Also, accidents with weapons are definitely a thing. Human regulation and cooperation is unsolved, so even knowing the difference between accident and intent is actually somewhat hard to define for many group activities.
  - Rohin Shah 5 Aug 2019 17:17 UTC
    3 points
    0
    Parent
    I agree with this; I’m not sure what point you’re trying to make?
    Perhaps you’re suggesting that the fact that its accident risk rather than weapons risk doesn’t mean that we’re safe, in which case I agree. I’m only suggesting that people stop using the analogy to nukes because its misleading, I’m not saying that there’s no risk as a result.
- Matthew Barnett 6 Aug 2019 1:52 UTC
  2 points
  0
  Parent
  I don’t see any analog to mutually assured destruction, which seems like a pretty key feature with nukes.
  Perhaps the appropriate analogy here would be two teams which both say “The other team is going to get to AI first if we don’t, and we prefer misalignment to losing, so we might as well push ahead.” The disanalogy here is that it’s not adversarial in the sense of being destructive (although it could be if they are enemies). But it’s analogous in the sense that they could either both decide to do nothing, or both decide to take the action. If they decide to take the action, they will both ensure their own destruction in the case of misalignment.
  - Rohin Shah 6 Aug 2019 18:21 UTC
    3 points
    0
    Parent
    This still feels more analogous to Chernobyl? “The other team is going to get cheap nuclear energy first if we don’t, and we prefer a nuclear accident to losing, so we might as well push ahead.”
    You might argue that obviously it doesn’t matter very much who gets nuclear energy first, so this wouldn’t apply. I’d respond that the benefit : cost ratio here seems similar to the benefit : cost ratio for AI where the benefit is “we build a singleton” and the cost is “misaligned AGI causes extinction”. Surely it’s significantly better for the other team to win and build a singleton than for you to build a misaligned AGI?
    (Separately, I think I would argue that the “we build a singleton” case is unlikely, but that’s not a crucial part of this argument.)