Meta-ethical relativists, in general, believe that the descriptive properties of terms such as “good”, “bad”, “right”, and “wrong” do not stand subject to universal truth conditions, but only to societal convention and personal preference. Given the same set of verifiable facts, some societies or individuals will have a fundamental disagreement about what one ought to do based on societal or individual norms, and one cannot adjudicate these using some independent standard of evaluation. The latter standard will always be societal or personal and not universal, unlike, for example, the scientific standards for assessing temperature or for determining mathematical truths.
And there you have it: I am not a meta-ethical relativist. Humans and Pebblesorters do not have a fundamental disagreement about what one ought to do; humans do what they ought to do and Pebblesorters do what they p-ought to do. You can’t have a disagreement without some particular fact or computation or idealized abstract dynamic that you are both arguing about, different beliefs with the same referent. “That which an optimization process does” is not a belief that can be argued about, it is a property of that optimization process. What is right, on the other hand, or what is p-right, is something that can be argued about; but a human does not dispute which piles of pebbles are prime.
Two people who actually disagree about the same question, a common referent, can try to adjudicate it (even if they don’t have a nice neat formal procedure which is readily computable and known to them). I’m not sure what a “universal truth condition” is, but statements about rightness have truth conditions just as much as p-rightness.
Furthermore, I believe that human beings are better than Pebblesorters. This is not written upon the very stars, it is written in Platonia as the objective answer to that question that we ask when we ask “Is it better?” and not “Is it more xyblz?”
If you follow through on this view it seems to lead to the position that everyone has their own referent for “good”, and there is no meaningful way for two different humans to argue about whether a given action is good. Which would suggest there is little point trying to persuade other people to be good, or hoping to collaboratively construct a friendly AI (since an l-friendly AI is unlikely to be e-friendly).
If you follow through on this view it seems to lead to the position that everyone has their own referent for “good”, and there is no meaningful way for two different humans to argue about whether a given action is good. Which would suggest there is little point trying to persuade other people to be good, or hoping to collaboratively construct a friendly AI (since an l-friendly AI is unlikely to be e-friendly).
Cooperation does not require modification of others to have identical values. Even agents with actively opposed values can cooperate (and so create a mutually friendly AI) so long as the opposition is not perfect in all regards.
This site has been at pains to emphasise that an AI will be an optimization process of never-before-seen power, rewriting reality in ways that we couldn’t possibly predict, and as such an AI whose values are even slightly misaligned with one’s own would be catastrophic for one’s actual values.
This site has been at pains to emphasise that an AI will be an optimization process of never-before-seen power, rewriting reality in ways that we couldn’t possibly predict, and as such an AI whose values are even slightly misaligned with one’s own would be catastrophic for one’s actual values.
What is relevant to the decision to create or prevent such an AI from operating is the comparison between what will occur in the absence of the AI and what the AI will do. For example gwern’s values are not identical to mine but if I had the choice between pressing a button to release an FAI or a button to destroy it then I would press the button to release it. FAI isn’t as good as FAI (by subjective tautology) but FAI is overwhelmingly better than nothing. I expect FAI to allow me to live for millions of years, and for the cosmic commons to be exploited to do things that I generally approve of. Without that AI I think it is most likely that myself and my species will go to oblivion.
The above doesn’t even take into account cooperation mechanisms. That’s just flat acceptance of optimisation for another’s values over distinctly sub-optimisation of my own. When it comes to agents with conflicting values cooperating negotiation applies and if both agents are rational and in a situation where mutual FAI creation is possible but unilateral FAI creation can be prevented then the result will be an FAI that optimises for a compromise of the value systems. To whatever extent the values of the two agents are not perfectly opposed this outcome will be superior to the non-cooperative outcome. For example if gwern and I were in such a situation the expected result would be the release of FAI>. Neither of us will prefer that option over the FAI that is personalised to ourselves but there is still a powerful incentive to cooperate. That outcome is better than what we would have without cooperation. The same applies if a paperclip maximiser and a staple maximiser are put in that situation. (It does not apply is a paperclip maximiser meets a paperclip minimiser.)
And there you have it: I am not a meta-ethical relativist. Humans and Pebblesorters do not have a fundamental disagreement about what one ought to do; humans do what they ought to do and Pebblesorters do what they p-ought to do. You can’t have a disagreement without some particular fact or computation or idealized abstract dynamic that you are both arguing about, different beliefs with the same referent. “That which an optimization process does” is not a belief that can be argued about, it is a property of that optimization process. What is right, on the other hand, or what is p-right, is something that can be argued about; but a human does not dispute which piles of pebbles are prime.
Two people who actually disagree about the same question, a common referent, can try to adjudicate it (even if they don’t have a nice neat formal procedure which is readily computable and known to them). I’m not sure what a “universal truth condition” is, but statements about rightness have truth conditions just as much as p-rightness.
Furthermore, I believe that human beings are better than Pebblesorters. This is not written upon the very stars, it is written in Platonia as the objective answer to that question that we ask when we ask “Is it better?” and not “Is it more xyblz?”
If you follow through on this view it seems to lead to the position that everyone has their own referent for “good”, and there is no meaningful way for two different humans to argue about whether a given action is good. Which would suggest there is little point trying to persuade other people to be good, or hoping to collaboratively construct a friendly AI (since an l-friendly AI is unlikely to be e-friendly).
Cooperation does not require modification of others to have identical values. Even agents with actively opposed values can cooperate (and so create a mutually friendly AI) so long as the opposition is not perfect in all regards.
This site has been at pains to emphasise that an AI will be an optimization process of never-before-seen power, rewriting reality in ways that we couldn’t possibly predict, and as such an AI whose values are even slightly misaligned with one’s own would be catastrophic for one’s actual values.
What is relevant to the decision to create or prevent such an AI from operating is the comparison between what will occur in the absence of the AI and what the AI will do. For example gwern’s values are not identical to mine but if I had the choice between pressing a button to release an FAI or a button to destroy it then I would press the button to release it. FAI isn’t as good as FAI (by subjective tautology) but FAI is overwhelmingly better than nothing. I expect FAI to allow me to live for millions of years, and for the cosmic commons to be exploited to do things that I generally approve of. Without that AI I think it is most likely that myself and my species will go to oblivion.
The above doesn’t even take into account cooperation mechanisms. That’s just flat acceptance of optimisation for another’s values over distinctly sub-optimisation of my own. When it comes to agents with conflicting values cooperating negotiation applies and if both agents are rational and in a situation where mutual FAI creation is possible but unilateral FAI creation can be prevented then the result will be an FAI that optimises for a compromise of the value systems. To whatever extent the values of the two agents are not perfectly opposed this outcome will be superior to the non-cooperative outcome. For example if gwern and I were in such a situation the expected result would be the release of FAI>. Neither of us will prefer that option over the FAI that is personalised to ourselves but there is still a powerful incentive to cooperate. That outcome is better than what we would have without cooperation. The same applies if a paperclip maximiser and a staple maximiser are put in that situation. (It does not apply is a paperclip maximiser meets a paperclip minimiser.)