I find myself in a rather similar position: I wrote both my sequence on AI, Alignment and Ethics and this post (which is basically a shorter and updated version of the sixth post in that sequence, The Mutable Values Problem in Value Learning and CEV) after many years of thinking about these issues by myself (initially as world-building for a still unpublished science fiction novel). I wrote them specifically in hope of sparking conversations with other people about these issues, which has so far been thinner on the ground that I’d hoped for, but has still been a good deal more than the previous zero.
Your explanation makes a good deal more sense now. (Incidentally, you might find my ideas in Uploading, the third post in that sequence, relevant to your interests as a would-be-em.) So you were rather explicitly assuming something other than all ASI being fully aligned to current human wants and desires, as I was assuming in my post. In which case the problem doesn’t automatically go away (since humans could simply do it to themselves, given the technological means to, and indeed might do so even more unwisely without ASI assistance), but the biological human problem then becomes rather easy for unaligned ASI to solve (or not) if they want to, if they’re not bound by being aligned to the humans’ wishes, and thus the humans changing themselves doesn’t automatically change the ASI, so now the ASI’s wishes become a potential anchor. On the other hand, the ASI ems now also have exactly the same issue, since they have even more effective technological means to change themselves, and even less practical biological constraints on them doing so. So I would thus expect two linked problems, one biological and one for the uploaded ems, and while the ems can stabilize the biological one if they want, that doesn’t inherently stabilize them. Or are you suggesting that the latter is the computational problem they’re keeping the biologicals around to solve, and that that would explicitly link the two, reducing this to one shared problem?
The second is what I’m suggesting yes. The biological humans live under approximately natural conditions with markets to establish preferences. Those preferences are then used by calculators to set prices for things, or values, or otherwise determine distribution for the ems. Something necessarily restrictive and approximative but provably functional. An exotic form of a familiar thing to be sure, and if anyone starts formalizing it then it might fall apart or be solved. Presently just an intuition informed by basic historical observation extrapolated very far out.
I see. I’m not sure that solves the problem for the ems, since I think the biologicals may already have one even by themselves with the ems then copy, but it certainly slows it down. And there is now an extra step where the ems look at something the biologicals chose to change about themselves and presumably have the option to say “we don’t approve, we’re not going to adopt that, and in fact we’re going to influence the biologicals to undo it, because (given that we’re not going to adopt it) it makes them less useful to us”, so it might actually slow the process even for the biologicals. Which doesn’t by itself prove that the process converges to a stable state, it might just mean it diverges more slowly. However, if the ems WANT the biologicals process to converge to a stable state rather than diverging to , they can almost certainly arrange that it does, since fundamentally they have more power.
However, I think the whole ems situation has a different instability, which I discuss in Uploading, so I see the whole situation as already unstable, just with a different failure mode. Very briefly, ems are easy to upgrade, and baseline human moral intuitions and ethical behaviors are not well calibrated for a situation in which some people have orders of magnitude more capability than others: humans are not actually aligned, they’re mereley good at allying between approximate equal, and one you start adding large capability differences between human in a society, things go badly. So if ems upgrade, you either need to keep their capabilities similar, or change their ethics / behavior enough that this isn’t a problem any more, or put a lot of social controls on preventing problems. So basically, ems/uploads have a problem comparable to the AI alignment problem, which similarly would need to be solved first before even became a potential problem.
I find myself in a rather similar position: I wrote both my sequence on AI, Alignment and Ethics and this post (which is basically a shorter and updated version of the sixth post in that sequence, The Mutable Values Problem in Value Learning and CEV) after many years of thinking about these issues by myself (initially as world-building for a still unpublished science fiction novel). I wrote them specifically in hope of sparking conversations with other people about these issues, which has so far been thinner on the ground that I’d hoped for, but has still been a good deal more than the previous zero.
problem doesn’t automatically go away (since humans could simply do it to themselves, given the technological means to, and indeed might do so even more unwisely without ASI assistance), but the biological human problem then becomes rather easy for unaligned ASI to solve (or not) if they want to, if they’re not bound by being aligned to the humans’ wishes, and thus the humans changing themselves doesn’t automatically change the ASI, so now the ASI’s wishes become a potential anchor. On the other hand, the ASI ems now also have exactly the same issue, since they have even more effective technological means to change themselves, and even less practical biological constraints on them doing so. So I would thus expect two linked problems, one biological and one for the uploaded ems, and while the ems can stabilize the biological one if they want, that doesn’t inherently stabilize them. Or are you suggesting that the latter is the computational problem they’re keeping the biologicals around to solve, and that that would explicitly link the two, reducing this to one shared problem?
Your explanation makes a good deal more sense now. (Incidentally, you might find my ideas in Uploading, the third post in that sequence, relevant to your interests as a would-be-em.) So you were rather explicitly assuming something other than all ASI being fully aligned to current human wants and desires, as I was assuming in my post. In which case the
The second is what I’m suggesting yes. The biological humans live under approximately natural conditions with markets to establish preferences. Those preferences are then used by calculators to set prices for things, or values, or otherwise determine distribution for the ems. Something necessarily restrictive and approximative but provably functional. An exotic form of a familiar thing to be sure, and if anyone starts formalizing it then it might fall apart or be solved. Presently just an intuition informed by basic historical observation extrapolated very far out.
I see. I’m not sure that solves the problem for the ems, since I think the biologicals may already have one even by themselves with the ems then copy, but it certainly slows it down. And there is now an extra step where the ems look at something the biologicals chose to change about themselves and presumably have the option to say “we don’t approve, we’re not going to adopt that, and in fact we’re going to influence the biologicals to undo it, because (given that we’re not going to adopt it) it makes them less useful to us”, so it might actually slow the process even for the biologicals. Which doesn’t by itself prove that the process converges to a stable state, it might just mean it diverges more slowly. However, if the ems WANT the biologicals process to converge to a stable state rather than diverging to , they can almost certainly arrange that it does, since fundamentally they have more power.
even became a potential problem.
However, I think the whole ems situation has a different instability, which I discuss in Uploading, so I see the whole situation as already unstable, just with a different failure mode. Very briefly, ems are easy to upgrade, and baseline human moral intuitions and ethical behaviors are not well calibrated for a situation in which some people have orders of magnitude more capability than others: humans are not actually aligned, they’re mereley good at allying between approximate equal, and one you start adding large capability differences between human in a society, things go badly. So if ems upgrade, you either need to keep their capabilities similar, or change their ethics / behavior enough that this isn’t a problem any more, or put a lot of social controls on preventing problems. So basically, ems/uploads have a problem comparable to the AI alignment problem, which similarly would need to be solved first before