Not trying to convince you of anything, but my personal issue is with 4 and 9. I am not certain that a superintelligence with its own incomprehensible to us behaviors (I would not presume that these can be derived from anything like “values” or “goals”, since it doesn’t even work with humans) would necessarily wipe humanity out. I see plenty of other options, including far fetched ones like creating its own baby universes. Or miniaturizing into some quantum world. Or most likely something we can’t even conceive of, like chimps can’t conceive of space or algebra.
Other than that, my guess is that creating an aligned intelligence is not even a well posed problem, since humans are not really internally aligned, not even on the question of whether survival of humanity is a good thing. And even if it were, unless there is a magic “alignment attractor” rule in the universe, there is basically no chance we could create an aligned entity on purpose. By analogy with “rocket alignment”, rockets blow up quite a bit before they ever fly… and odds are, there is only one chance at launching an aligned AI. So your point 3 is unavoidable, and we do not have a hope in hell of containing anything smarter than us.
It’s possible, but it can also be possible that at some threshold of intelligence it finds a pathway which is richer and much more interesting than what we observe as humans (compared it to earthworms knowing of nothing but dirt), and leave for the greener pastures.
So if I’m understanding you correctly (and let me know if I’m not, of course, since I may be extrapolating way beyond what you intended) you’re saying that we will not solve alignment ever, because:
A. “Alignment” as a term relies on a conception of humanity as a sort of unified group which doesn’t really exist, because we all have either subtly or massively different fundamental goals. Aiming for “what’s best for humanity” (perhaps through Yudkowsky’s CEV or something) is not doable even in theory without literally changing people’s value functions to be identical (which would classify as an x-risk type scenario, imo).
B. Regardless of A, we’ve only got one shot at alignment (implying assumptions 3 and 7), and… Here I noticed my confusion, since you seem to be using a statement relying on assumption 3 to argue for 3, which seems somewhat circular, so I’m probably misunderstanding you there. By the argument you give, the situation is in fact avoidable if there are in fact multiple chances of launching an AGI for whatever reason.
It seems to me that A may be a restatement of the governance problem in political theory (aka “how can a government be maximally ethical?”). If so, I’d say the solution there is to simply redefine alignment as aiming for some individual’s ethical values, which would presumably include concepts such as the value of alternative worldviews, etc. (this is just one thought, doesn’t need to actually be The Answer™). Your objection seems to be primarily semantic in nature, and I don’t see any strong reason why it can’t be overcome by simply posing the problem better, and then answering that problem.
(posting below just to note I ended up editing the above comment, instead of posting below as I’d previously promised, so that way I could fulfil said promise ;))
Not trying to convince you of anything, but my personal issue is with 4 and 9. I am not certain that a superintelligence with its own incomprehensible to us behaviors (I would not presume that these can be derived from anything like “values” or “goals”, since it doesn’t even work with humans) would necessarily wipe humanity out. I see plenty of other options, including far fetched ones like creating its own baby universes. Or miniaturizing into some quantum world. Or most likely something we can’t even conceive of, like chimps can’t conceive of space or algebra.
Other than that, my guess is that creating an aligned intelligence is not even a well posed problem, since humans are not really internally aligned, not even on the question of whether survival of humanity is a good thing. And even if it were, unless there is a magic “alignment attractor” rule in the universe, there is basically no chance we could create an aligned entity on purpose. By analogy with “rocket alignment”, rockets blow up quite a bit before they ever fly… and odds are, there is only one chance at launching an aligned AI. So your point 3 is unavoidable, and we do not have a hope in hell of containing anything smarter than us.
The problem is that humanity’s behavior will wipe humanity out: if first AGI will miniaturize into some quantum world, we will create the second one.
It’s possible, but it can also be possible that at some threshold of intelligence it finds a pathway which is richer and much more interesting than what we observe as humans (compared it to earthworms knowing of nothing but dirt), and leave for the greener pastures.
I mean that if that’s what happens, we will redefine intelligence and try to build something, that doesn’t leave.
So if I’m understanding you correctly (and let me know if I’m not, of course, since I may be extrapolating way beyond what you intended) you’re saying that we will not solve alignment ever, because:
A. “Alignment” as a term relies on a conception of humanity as a sort of unified group which doesn’t really exist, because we all have either subtly or massively different fundamental goals. Aiming for “what’s best for humanity” (perhaps through Yudkowsky’s CEV or something) is not doable even in theory without literally changing people’s value functions to be identical (which would classify as an x-risk type scenario, imo).
B. Regardless of A, we’ve only got one shot at alignment (implying assumptions 3 and 7), and… Here I noticed my confusion, since you seem to be using a statement relying on assumption 3 to argue for 3, which seems somewhat circular, so I’m probably misunderstanding you there. By the argument you give, the situation is in fact avoidable if there are in fact multiple chances of launching an AGI for whatever reason.
It seems to me that A may be a restatement of the governance problem in political theory (aka “how can a government be maximally ethical?”). If so, I’d say the solution there is to simply redefine alignment as aiming for some individual’s ethical values, which would presumably include concepts such as the value of alternative worldviews, etc. (this is just one thought, doesn’t need to actually be The Answer™). Your objection seems to be primarily semantic in nature, and I don’t see any strong reason why it can’t be overcome by simply posing the problem better, and then answering that problem.
(posting below just to note I ended up editing the above comment, instead of posting below as I’d previously promised, so that way I could fulfil said promise ;))