So if I’m understanding you correctly (and let me know if I’m not, of course, since I may be extrapolating way beyond what you intended) you’re saying that we will not solve alignment ever, because:
A. “Alignment” as a term relies on a conception of humanity as a sort of unified group which doesn’t really exist, because we all have either subtly or massively different fundamental goals. Aiming for “what’s best for humanity” (perhaps through Yudkowsky’s CEV or something) is not doable even in theory without literally changing people’s value functions to be identical (which would classify as an x-risk type scenario, imo).
B. Regardless of A, we’ve only got one shot at alignment (implying assumptions 3 and 7), and… Here I noticed my confusion, since you seem to be using a statement relying on assumption 3 to argue for 3, which seems somewhat circular, so I’m probably misunderstanding you there. By the argument you give, the situation is in fact avoidable if there are in fact multiple chances of launching an AGI for whatever reason.
It seems to me that A may be a restatement of the governance problem in political theory (aka “how can a government be maximally ethical?”). If so, I’d say the solution there is to simply redefine alignment as aiming for some individual’s ethical values, which would presumably include concepts such as the value of alternative worldviews, etc. (this is just one thought, doesn’t need to actually be The Answer™). Your objection seems to be primarily semantic in nature, and I don’t see any strong reason why it can’t be overcome by simply posing the problem better, and then answering that problem.
(posting below just to note I ended up editing the above comment, instead of posting below as I’d previously promised, so that way I could fulfil said promise ;))
So if I’m understanding you correctly (and let me know if I’m not, of course, since I may be extrapolating way beyond what you intended) you’re saying that we will not solve alignment ever, because:
A. “Alignment” as a term relies on a conception of humanity as a sort of unified group which doesn’t really exist, because we all have either subtly or massively different fundamental goals. Aiming for “what’s best for humanity” (perhaps through Yudkowsky’s CEV or something) is not doable even in theory without literally changing people’s value functions to be identical (which would classify as an x-risk type scenario, imo).
B. Regardless of A, we’ve only got one shot at alignment (implying assumptions 3 and 7), and… Here I noticed my confusion, since you seem to be using a statement relying on assumption 3 to argue for 3, which seems somewhat circular, so I’m probably misunderstanding you there. By the argument you give, the situation is in fact avoidable if there are in fact multiple chances of launching an AGI for whatever reason.
It seems to me that A may be a restatement of the governance problem in political theory (aka “how can a government be maximally ethical?”). If so, I’d say the solution there is to simply redefine alignment as aiming for some individual’s ethical values, which would presumably include concepts such as the value of alternative worldviews, etc. (this is just one thought, doesn’t need to actually be The Answer™). Your objection seems to be primarily semantic in nature, and I don’t see any strong reason why it can’t be overcome by simply posing the problem better, and then answering that problem.
(posting below just to note I ended up editing the above comment, instead of posting below as I’d previously promised, so that way I could fulfil said promise ;))