“In worlds where AI alignment can be handled by iterative design, we probably survive. So long as we can see the problems and iterate on them, we can probably fix them, or at least avoid making them worse.” This is not necessarily true! AI alignment is only part of the problem; solving it doesn’t mean things automatically go well. For example, if an ASI is aligned to an individual and that individual wants to kill everyone (or kill everyone but a small class of people, or wants to enforce a hivemind merge, etc.) then we don’t survive. Or there’s the risk of gradual disempowerment.
To rephrase this in the words of Zvi from that article: “As in, in ‘Phase 1’ we have to solve alignment, defend against sufficiently catastrophic misuse and prevent all sorts of related failure modes. If we fail at Phase 1, we lose.
If we win at Phase 1, however, we don’t win yet. We proceed to and get to play Phase 2.”
“In worlds where AI alignment can be handled by iterative design, we probably survive. So long as we can see the problems and iterate on them, we can probably fix them, or at least avoid making them worse.”
This is not necessarily true! AI alignment is only part of the problem; solving it doesn’t mean things automatically go well. For example, if an ASI is aligned to an individual and that individual wants to kill everyone (or kill everyone but a small class of people, or wants to enforce a hivemind merge, etc.) then we don’t survive. Or there’s the risk of gradual disempowerment.
To rephrase this in the words of Zvi from that article: “As in, in ‘Phase 1’ we have to solve alignment, defend against sufficiently catastrophic misuse and prevent all sorts of related failure modes. If we fail at Phase 1, we lose.
If we win at Phase 1, however, we don’t win yet. We proceed to and get to play Phase 2.”