paulfchristiano comments on Some AI research areas and their relevance to existential safety

paulfchristiano 21 Nov 2020 2:09 UTC
LW: 11 AF: 7
AF
A number of blogs seem to treat [AI existential safety, AI alignment, and AI safety] as near-synonyms (e.g., LessWrong, the Alignment Forum), and I think that is a mistake, at least when it comes to guiding technical work for existential safety.
I strongly agree with the benefits of having separate terms and generally like your definitions.
In this post, AI existential safety means “preventing AI technology from posing risks to humanity that are comparable or greater than human extinction in terms of their moral significance.”
I like “existential AI safety” as a term to distinguish from “AI safety” and agree that it seems to be clearer and have more staying power. (That said, it’s a bummer that “AI existential safety forum” is a bit of a mouthful.)
If I read that term without a definition I would assume it meant “reducing the existential risk posed by AI.” Hopefully you’d be OK with that reading. I’m not sure if you are trying to subtly distinguish it from Nick’s definition of existential risk or if the definition you give is just intended to be somewhere in that space of what people mean when they say “existential risk” (e.g. the LW definition is like yours).
- Andrew_Critch 1 Apr 2021 0:52 UTC
  LW: 3 AF: 2
  AF Parent
  Good to hear!
  If I read that term [“AI existential safety”] without a definition I would assume it meant “reducing the existential risk posed by AI.” Hopefully you’d be OK with that reading. I’m not sure if you are trying to subtly distinguish it from Nick’s definition of existential risk or if the definition you give is just intended to be somewhere in that space of what people mean when they say “existential risk” (e.g. the LW definition is like yours).
  Yep, that’s my intention. If given the chance I’d also shift the meaning of “existential risk” a bit away from Bostrom’s and a bit toward a more naive meaning of the term, but that’s a separate objective :) Specifically, if I got to rewrite Nick’s terminology (which might be too late now that it’s on Wikipedia), I’d say “existential risk” should mean “risk to the existence of humanity” and “existential-level risk” should mean “risks that are as morally significant as risks to the existence of humanity” (which, roughly speaking, is what Bostrom currently calls “existential risk”).