Matthew Barnett comments on AI Alignment Open Thread August 2019

Matthew Barnett 6 Aug 2019 1:52 UTC
2 points
0
I don’t see any analog to mutually assured destruction, which seems like a pretty key feature with nukes.
Perhaps the appropriate analogy here would be two teams which both say “The other team is going to get to AI first if we don’t, and we prefer misalignment to losing, so we might as well push ahead.” The disanalogy here is that it’s not adversarial in the sense of being destructive (although it could be if they are enemies). But it’s analogous in the sense that they could either both decide to do nothing, or both decide to take the action. If they decide to take the action, they will both ensure their own destruction in the case of misalignment.
- Rohin Shah 6 Aug 2019 18:21 UTC
  3 points
  0
  Parent
  This still feels more analogous to Chernobyl? “The other team is going to get cheap nuclear energy first if we don’t, and we prefer a nuclear accident to losing, so we might as well push ahead.”
  You might argue that obviously it doesn’t matter very much who gets nuclear energy first, so this wouldn’t apply. I’d respond that the benefit : cost ratio here seems similar to the benefit : cost ratio for AI where the benefit is “we build a singleton” and the cost is “misaligned AGI causes extinction”. Surely it’s significantly better for the other team to win and build a singleton than for you to build a misaligned AGI?
  (Separately, I think I would argue that the “we build a singleton” case is unlikely, but that’s not a crucial part of this argument.)