Before building FAI you built an oracle AI to help you. With its help, you found a mathematical definition of U, the utility of humanity’s extrapolated volition (or whatever). You were all pretty pleased with yourselves, but you didn’t stop there: you found a theory of everything, located humanity within it, and wrote down the predicate F(X) = “The humans run the program described by X.”
To top it off, with the help of your oracle AI you found the code for a “best possible AI”, call it FAI, and a proof of the theorem:
There exists a constant Best such that U ≤ Best, but F(FAI) implies U = Best.”
Each of these steps you did with incredible care. You have proved beyond reasonable doubt that U and F represent what you want them to.
You present your argument to the people of the world. Some people object to your reasoning, but it is airtight: if they choose to stop you from running FAI, they will still receive U ≤ Best, so why bother?
Now satisfied and with the scheduled moment arrived, you finally run FAI. Promptly the oracle AI destroys civilization and spends the rest of its days trying to become as confident as possible that Tic-Tac-Toe is really a draw (like you asked it to, once upon a time).
-Paul F. Christiano