deepthoughtlife comments on How evolution succeeds and fails at value alignment

deepthoughtlife 21 Aug 2022 16:02 UTC
3 points
2
This is well written, easy to understand, and I largely agree that instilling a value like love for humans in general (as individuals) could deal with an awful lot of failure modes. It does so amongst humans already (though far from perfectly).
When there is a dispute, rather than optimizing over smiles in a lifetime (a proxy of long-term happiness), preferable is obviously something more difficult like, if the versions of the person in both the worlds where it did and did not happen would end up agreeing that it is better to have happened, and that it would have been better to force the issue, then it might make sense to override the human’s current preference. Since the future is not known, such determinations are obviously probabilistic, but the thresholds should be set quite high. The vast majority of adults agree that as a child, they should have gotten vaccinations for many diseases, so the probability that the child would later agree is quite high.
Smiles in a lifetime is a great proxy for what an aligned intelligence, artificial or not, should value getting for those it loves when multiple actions are within acceptable bounds, either because of the above or because of the current preferences of that person and their approval in the world where it happens.
Two out of three versions of the person approving is only complicated in worlds where it is the one where it happens that is the one that would disapprove.