I think one part of the reason for confidence is that any AI weak enough to be safe without being aligned, is weak enough that it can’t do much, and in particular it can’t do things that a committed group of humans couldn’t do without it. In other words, if you can name such an act, then you don’t need the AI to make the pivotal moves. And if you know how, as a human or group of humans, to take an action that reliably stops future-not-yet-existing AGI from destroying the world, and without the action itself destroying the world, then in a sense haven’t you solved alignment already?
I think one part of the reason for confidence is that any AI weak enough to be safe without being aligned, is weak enough that it can’t do much, and in particular it can’t do things that a committed group of humans couldn’t do without it. In other words, if you can name such an act, then you don’t need the AI to make the pivotal moves. And if you know how, as a human or group of humans, to take an action that reliably stops future-not-yet-existing AGI from destroying the world, and without the action itself destroying the world, then in a sense haven’t you solved alignment already?