Vladimir_Nesov comments on Generalization and the Multiple Stage Fallacy?

Vladimir_Nesov 7 Oct 2025 15:13 UTC
2 points
0
There are two different issues with “the first critical try” (the After regime), where misalignment is lethal. First, maybe alignment is sufficiently solved, and so when you enter After, that’s why it doesn’t kill you. But second, maybe After never arrives.

Gradualist arguments press both issues, not just alignment of After. Sufficient control makes increasingly capable AIs non-lethal if misaligned, which means that an AI that would bring about the After regime today wouldn’t do so in the future where better countermeasures (that are not about alignment) are in place. Which is to say this particular AI won’t enter the After regime yet, because the world is sufficiently different and this AI’s capabilities are now insufficient for lethality, an even more capable AI would be necessary for that.

This is different from an ASI Pause delaying the After regime until ASI-grade alignment is solved, because the level of capabilities that counts as After keeps changing. Instead of delaying ASI at a fixed level of capabilities until alignment is solved, After is being pushed into the future by increasing levels of control that make increasingly capable AIs non-critical. As a result, After never happens at all, instead of only happening once alignment at a relevant level is sufficiently solved.

(Of course the feasibility of ASI-grade control is as flimsy as the feasibility of ASI-grade alignment, when working on a capabilities schedule without an AGI/ASI Pause, not to mention gradual disempowerment in the gradualist regime without an AGI Pause. But the argument is substantially different, and a proponent of gradualist development of ASI-grade control might feel that there is no fixed After, and maybe that After never actually arrives even as capabilities keep increasing. The arguments against feasibility of gradualist development of ASI-grade alignment on the other hand feel like they are positing a fixed After whose arrival remains inevitable at some point, which doesn’t acknowledge the framing from gradualist arguments about development of ASI-grade control.)