A (EtA: quick) note on terminology: AI Alignment != AI x-safety
I think the terms “AI Alignment” and “AI existential safety” are often used interchangeably, leading the ideas to be conflated.
In practice, I think “AI Alignment” is mostly used in one of the following three ways, and should be used exclusively for Intent Alignment (with some vagueness about whose intent, e.g. designer vs. user):
1) AI Alignment = How to get AI systems to do what we want
2) AI Alignment = How to get AI systems to try to do what we want
3) AI Alignment = A rebranding of “AI (existential) safety”… A community of people trying to reduce the chance of AI leading to premature human extinction.
The problem with (1) is that it is too broad, and invites the response: “Isn’t that what most/all AI research is about?”
The problem with (3) is that it suggests that (Intent) Alignment is the one-and-only way to increase AI existential safety.
Some reasons not to conflate (2) and (3):
The case that increasing (intent) alignment increases x-safety seems much weaker on the margin than in the limit; the main effect of a moderate increase in intent alignment might simply be a large increase in demand for AI.
Even perfect intent alignment doesn’t necessarily result in a safe outcome; e.g. if everyone woke up 1000000x smarter tomorrow, the world might end by noon.
X-safety can be increased through non-technical means, e.g. governance/coordination.
EtA: x-safety can be increased through technical work other than alignment, e.g. assurance methods, e.g. value alignment verification.
In my experience, this sloppy use of terminology is common in this community, and leads to incorrect reasoning (if not in those using it than certainly at least sometimes in those hearing/reading it).
EtA: This Tweet and associated paper make a similar point: https://twitter.com/HeidyKhlaaf/status/1634173714055979010