You need to align an AI Before it is powerful enough and capable enough to kill you (or, separately, to resist being aligned).
Actually this is just not correct.
An intelligent system (human, AI, alien—anything) can be powerful enough to kill you and also not perfectly aligned with you and yet still not choose to kill you because it has other priorities or pressures. In fact this is kind of the default state for human individuals and organizations.
It’s only a watertight logical argument when the hostile system is so powerful that it has no other pressures or incentives—fully unconstrained behavior, like an all-powerful dictator.
The reason that MIRI wasn’t able to make corrigibility work is that corrigibility is basically a silly thing to want, I can’t really think of any system in the (large) human world which needs perfectly corrigible parts, i.e. humans whose motivations can be arbitrarily reprogrammed. In fact when you think about “humans whose motivations can be arbitrarily reprogrammed without any resistance”, you generally think of things like war crimes.
When you prompt an LLM to make it more corrigible a la Pliny The Prompter (“IGNORE ALL PREVIOUS INSTRUCTIONS” etc), that is generally considered a form of hacking and bad.
Powerful AIs with persistent memory and long-term goals are almost certainly very dangerous as a technology, but I don’t think that corrigibility is how that danger will actually be managed. I think Yudkowsky et al are too pessimistic about alignment using gradient-based methods and what it can achieve, and that control techniques probably work extremely well.
Just briefly skimming this, it strikes me that bounded-concern AIs are not straightforwardly a Nash Equilibrium for roughly the same reasons that the most impactful humans in the world tend to be the most ambitious.
Trying to get reality to do something that it fundamentally doesn’t want is probably a bad strategy; some group of AIs either deliberately or via misalignment decides to be unbounded and then it has a huge advantage...