i expect the thing that kills us if we die, and the thing that saves us if we are saved, to be strong/general coherent agents
I agree in the sense that strong optimization is the likely shape of equilibrium (though I wouldn’t go so far as to say it’s utility maximization specifically), and in that equilibrium humanity is either fine or not. Conversely, while humanity remains alive, the doom status of the eventual outcome remains in question until there is a strong optimization equilibrium. Doom could come sooner, but singularity is fast in physical time, so the distinction doesn’t necessarily matter.
But do you expect humans to build strong optimization? The way things are going, it’s weakly coherent AGIs that are going to build strong optimization, while any alignment-relevant things humanity can do are not going to be about alignment of strong optimization, they are instead about alignment of weakly coherent AGIs (with LLM characters as the obvious candidate for successful alignment, and much more tenuous grounds for alignability of other things).
I agree in the sense that strong optimization is the likely shape of equilibrium (though I wouldn’t go so far as to say it’s utility maximization specifically), and in that equilibrium humanity is either fine or not. Conversely, while humanity remains alive, the doom status of the eventual outcome remains in question until there is a strong optimization equilibrium. Doom could come sooner, but singularity is fast in physical time, so the distinction doesn’t necessarily matter.
But do you expect humans to build strong optimization? The way things are going, it’s weakly coherent AGIs that are going to build strong optimization, while any alignment-relevant things humanity can do are not going to be about alignment of strong optimization, they are instead about alignment of weakly coherent AGIs (with LLM characters as the obvious candidate for successful alignment, and much more tenuous grounds for alignability of other things).