localdeity comments on AGI Ruin: A List of Lethalities

localdeity 6 Jun 2022 4:20 UTC
14 points
16
To point 4 and related ones, OpenAI has this on their charter page:
We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions. Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project. We will work out specifics in case-by-case agreements, but a typical triggering condition might be “a better-than-even chance of success in the next two years.”
What about the possibility of persuading the top several biggest actors (DeepMind, FAIR, etc.) to agree to something like that? (Note that they define AGI on the page to mean “highly autonomous systems that outperform humans at most economically valuable work”.) It’s not very fleshed out, either the conditions that trigger the pledge or how the transition goes, but it’s a start. The hope would be that someone would make something “sufficiently impressive to trigger the pledge” that doesn’t quite kill us, and then ideally (a) the top actors stopping would buy us some time and (b) the top actors devoting their people to helping out (I figure they could write test suites at minimum) could accelerate the alignment work.
I see possible problems with this, but is this at least in the realm of “things worth trying”?
- Vaniver 6 Jun 2022 14:54 UTC
  9 points
  1
  Parent
  What about the possibility of persuading the top several biggest actors (DeepMind, FAIR, etc.) to agree to something like that?
  My understanding is that this has been tried, at various levels of strength, ever since OpenAI published its charter. My sense is that’s MIRI’s idea of “safety-conscious” looks like this, which it guessed was different from OpenAI’s sense; I kind of wish that had been a public discussion back in 2018.
- Lone Pine 6 Jun 2022 21:04 UTC
  4 points
  0
  Parent
  Given that Sam Altman has some of the shortest timelines around, I wonder if he could be persuaded that DeepMind are within 2 years of the finish line, or will be visibly within 2 years of the finish line in a few years. (Not implying that would be a solution to anything, I’m just curious what it would take for that clause to apply.)