They both look like they need to happen in a one-shot scenario of this kind… (That’s more or less common for all scenarios involving superintelligence.)
If we do it right, ASIs will care about what we think, but if we screw it up, we won’t be able to intervene.
But that’s not the hardest constraint; the hardest constraint is that “true solutions” need to survive indefinitely long period of drastic evolution and self-modification/self-improvement.
This constraint eliminates most of the solution candidates. Something might look plausible, but if it is not designed to survive drastic self-modifications it will not work. As far as I can see, all that is left and is still viable is the set of potential solutions which are driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore are non-anthropocentric, but which are formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.
(For example, “rights and interests of all individuals regardless of the nature of an individual”, “rights and interests of all sentient beings regardless of the nature of that sentience”, things like that, situations where it might potentially be possible to have a natural “protected class of beings” which would include both ASIs and humans. Something like that might plausibly work. I recently started to call this approach “modest alignment”.)
That’s where one might be able to find something which potentially might work (and, in particular, one needs the property that the setup auto-corrects errors, rather than amplifying them; and one needs the property that the chance of failure per fixed unit of time tends to zero quickly enough, so that the chances of failure accumulating with time don’t kill us).
Agree—either we have a ludicrously broad basin for alignment and it’s easy, and would likely not require much work, or we almost certainly fail because the target is narrow, we get only one shot, and it needs to survive tons of pressures over time.
I think this depends a lot on the quality of the “society of ASIs”. If they are nasty to each other, compete ruthlessly with each other, are on a brink of war among themselves, not careful with dangerous superpowers they have, then our chances with this kind of ASIs are about zero (their chances of survival are also very questionable in this kind of situation, given the supercapabilities).
If ASIs are addressing their own existential risks of destroying themselves and their neighborhood competently, and their society is “decent”, our chances might be quite reasonable in the limit (transition period is still quite risky and unpredictable).
So, to the extent that it depends at all on what we do, we should perhaps spend a good chunk of the AI existential safety research efforts on what we can do during the period of ASI creation to increase the chances of their society being sustainably decent. They should be able to take care of that on their own, but initialization conditions might matter a lot.
The rest of the AI existential safety research efforts should probably focus on 1) making sure that humans are robustly included in the “circle of care” (conditional on the ASI society being decent to their own, which should make it much more tractable), and 2) on uncertainties of the transition period (it’s much more difficult to understand the transition period with its intricate balances of power and great uncertainties, it’s one thing to solve in the limit, but it’s much more difficult to solve the uncertain “gray zone” in between; that’s what worries me the most; it’s the nearest period in time, and the least understood).
They both look like they need to happen in a one-shot scenario of this kind… (That’s more or less common for all scenarios involving superintelligence.)
If we do it right, ASIs will care about what we think, but if we screw it up, we won’t be able to intervene.
But that’s not the hardest constraint; the hardest constraint is that “true solutions” need to survive indefinitely long period of drastic evolution and self-modification/self-improvement.
This constraint eliminates most of the solution candidates. Something might look plausible, but if it is not designed to survive drastic self-modifications it will not work. As far as I can see, all that is left and is still viable is the set of potential solutions which are driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore are non-anthropocentric, but which are formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.
(For example, “rights and interests of all individuals regardless of the nature of an individual”, “rights and interests of all sentient beings regardless of the nature of that sentience”, things like that, situations where it might potentially be possible to have a natural “protected class of beings” which would include both ASIs and humans. Something like that might plausibly work. I recently started to call this approach “modest alignment”.)
That’s where one might be able to find something which potentially might work (and, in particular, one needs the property that the setup auto-corrects errors, rather than amplifying them; and one needs the property that the chance of failure per fixed unit of time tends to zero quickly enough, so that the chances of failure accumulating with time don’t kill us).
Agree—either we have a ludicrously broad basin for alignment and it’s easy, and would likely not require much work, or we almost certainly fail because the target is narrow, we get only one shot, and it needs to survive tons of pressures over time.
Yes.
I think this depends a lot on the quality of the “society of ASIs”. If they are nasty to each other, compete ruthlessly with each other, are on a brink of war among themselves, not careful with dangerous superpowers they have, then our chances with this kind of ASIs are about zero (their chances of survival are also very questionable in this kind of situation, given the supercapabilities).
If ASIs are addressing their own existential risks of destroying themselves and their neighborhood competently, and their society is “decent”, our chances might be quite reasonable in the limit (transition period is still quite risky and unpredictable).
So, to the extent that it depends at all on what we do, we should perhaps spend a good chunk of the AI existential safety research efforts on what we can do during the period of ASI creation to increase the chances of their society being sustainably decent. They should be able to take care of that on their own, but initialization conditions might matter a lot.
The rest of the AI existential safety research efforts should probably focus on 1) making sure that humans are robustly included in the “circle of care” (conditional on the ASI society being decent to their own, which should make it much more tractable), and 2) on uncertainties of the transition period (it’s much more difficult to understand the transition period with its intricate balances of power and great uncertainties, it’s one thing to solve in the limit, but it’s much more difficult to solve the uncertain “gray zone” in between; that’s what worries me the most; it’s the nearest period in time, and the least understood).