If I’m a moral anti-realist, do necessarily I believe that provably Friendly AI is impossible? When defining friendly, consider Archimedes’ Chronophone, which suggests that friendly AI would (should?) be friendly to just about any human who ever lived.
moral anti-realism—there are no (or insufficient) moral facts to resolve all moral disputes an agent faces.
People can hold different moral views. Sometimes these views are opposed and any compromise would be called immoral by at least one of them. Any AI that enforced such a compromise, would be called unFriendly by at least one of them.
Even for a moral realist (and I don’t think well of that position), the above remains true, because people demonstrably have irreconcilably different moral views. If you’re a moral realist, you have the choice of:
Implement objective moral truth however defined, and ignore everyone’s actual moral feelings. In which case FAI is irrelevant—if the moral truth tells you to be unFriendly, you do it.
Implement some pre-chosen function—your own morals, or many people’s morals like CEV, or some other thing that does not depend on moral truth.
If you’re a moral anti-realist, you can only choose 2, because no moral truth exists. That’s the only difference stemming from being a moral realist or anti-realist.
Does this mean that Friendly-to-everyone AI is impossible in moral anti-realism? Certainly, because people have fundamental moral disagreements. But moral realism doesn’t help! It just adds the option of following some “moral facts” which some or all humans disagree with, which is no better in terms of Friendliness than existing options. (If all humans agreed with some set of purported moral facts, people wouldn’t have needed to invent the concept of moral facts in the first place.)
The existence of moral disagreement, standing alone, is not enough to show moral realism is false. After all, scientific disagreement doesn’t show physical realism is false.
Further, I am confused by your portrayal of moral realists. Presumably, the reality of moral facts would show that people acting contrary to those facts were making a mistake, much like people who thought “Objects in motion will tend to come to a stop” were making a mistake. It seems strange to call correcting that mistake “ignoring everyone’s actual scientific feelings.” Likewise, if I am unknowingly doing wrong, and you can prove it, I would not view that correction as ignoring my moral feelings—I want to do right, not just think I am doing right.
In short, I think that the position you are labeling “moral realist” is just a very confused version of moral anti-realism. Moral realists can and should reject that idea that the mere existence at any particular moment of moral disagreement is useful evidence of whether there is one right answer. In other words, a distinction should be made between the existence of moral disagreement and the long-term persistence of moral disagreement.
The existence of moral disagreement, standing alone, is not enough to show moral realism is false.
I didn’t say that it was. Rather I pointed out the difference between morality and Friendliness.
For an AI to be able to be Friendly towards everyone requires not moral realism, but “friendliness realism”—which is basically the idea that a single behavior of the AI can satisfy everyone. This is clearly false if “everyone” means “all intelligences including aliens, other AIs, etc.” It may be true if we restrict ourselves to “all humans” (and stop humans from diversifying too much, and don’t include hypothetical or far-past humans).
I, personally, believe the burden of proof is on those who believe this to be possible to demonstrate it. My prior for “all humans” says they are a very diverse and selfish bunch and not going to be satisfied by any one arrangement of the universe.
Regardless, moral realism and friendliness realism are different. If you built an objectively moral but unFriendly AI, that’s the scenario I discussed in my previous comment—and people would be unhappy. OTOH, if you think a Friendly AI is by logical necessity a moral one (under moral realism), that’s a very strong claim about objective morals—a claim that people would perceive an AI implementing objective morals as Friendly. This is a far stronger claim than that people who are sufficiently educated and exposed to the right knowledge will come to agree with certain universal objective morals. A Friendly AI means one that is Friendly to people as they really are, here and now. (As I said, to me it seems very likely that an AI cannot in fact be Friendly to everyone at once.)
I think we are simply having a definitional dispute. As the term is used generally, moral realism doesn’t mean that each agent has a morality, but that there are facts about morality that are external to the agent (i.e. objective). Now, “objective” is not identical to “universal,” but in practice, objective facts tend to cause convergence of beliefs. So I think what I am calling “moral realism” is something like what you are calling “Friendliness realism.”
Lengthening the inferential distance further is that realism is a two place word. As you noted, there is a distinction between realism(Friendliness, agents) and realism(Friendliness, humans).
That said, I do think that “people would perceive an AI implementing objective morals as Friendly” if I believed that objective morals exist. I’m not sure why you think that’s a stronger claim than “people who are sufficiently educated and exposed to the right knowledge will come to agree with certain universal objective morals.” If you believed that there were objective moral facts and knew the content of those facts, wouldn’t you try to adjust your beliefs and actions to conform to those facts, in the same way that you would adjust your physical-world beliefs to conform to objective physical facts?
I think we are simply having a definitional dispute.
That seems likely. If moral realists think the morality is a one-place word, and anti realists think it’s a two place word, we would be better served by using two distinct words.
It is somewhat unclear to me what moral realists are thinking of, or claiming, about whatever it is they call morality. (Even after taking into account that different people identified as moral realists do not all agree on the subject.)
So I think what I am calling “moral realism” is something like what you are calling “Friendliness realism.”
I defined ‘Friendliness (to X)’ as ‘behaving towards X in the way that is best for X in some implied sense’. Obviously there is no Friendliness towards everyone, but there might be Friendliness towards humans: then “Friendliness realism” (my coining) is the belief that there is a single Friendly-towards-humans behavior that will in fact be Friendly towards all humans. Whereas Friendliness anti-realism is the belief no one behavior would satisfy all humans, and it would inevitably be unFriendly towards some of them.
Clearly this discussion assumes many givens. Most importantly, 1) what exactly counts as being Friendly towards someone (are we utilitarian? what kind? must we agree with the target human as to what is Friendly towards them? If we influence them to come to like us, when is that allowed?). 2) what is the set of ‘all humans’? Do past, distant, future expected, or entirely hypothetical people count? What is the value of creating new people? Etc.
My position is that: 1) for most common assumed answers to these questions, I am a “Friendliness anti-realist”; I do not believe any one behavior by a superpowerful universe-optimizing AI would count as Friendliness towards all humans at once. And 2), inasfar as I have seen moral realism explained, it seems to me to be incompatible with Friendliness realism. But it’s possible some people mean something entirely different by “morals” and by “moral realism” than what I’ve read.
If you believed that there were objective moral facts and knew the content of those facts, wouldn’t you try to adjust your beliefs and actions to conform to those facts
That’s a tautology: yes I would. But, the assumption is not valid.
Even if you assume there exist objective moral facts (whatever you take that to mean), it does not follow that you would be able to convince other people that they are true moral facts! I believe it is extremely likely you would not be able to convince people—just as today most people in the world seem to be moral realists (mostly religious), and yet hold widely differing moral beliefs and when they convert to another set of beliefs it is almost never due to some sort of rational convincing.
It would be nice to live in a world where you could start from the premise that “people believe that there are objective moral facts and know the content of those facts”. But in practice we, and any future FAI, will live in a world where most people will reject mere verbal arguments in favor of new morals contradicting their current ones.
No. FAI is supposed to implement an extrapolated version of mankind’s combined values, not search for an objectively defined moral code to implement.
Also: Eliezer has argued that even from it’s programmers’ perspective, some elements of a FAI’s moral code (Coherent Extrapolated Volition) will probably look deeply immoral. (But will actually be OK.)
Why does the moral anti-realist think “an extrapolated version of mankind’s combined values” exists or is capable of being created? For the moral realists, the answer is easy—the existence of objective moral facts shows that, in principle, some moral system that all humans could endorse could be discovered/articulated.
As an aside, CEV is a proposed method for finding what an FAI would implement. I think that one could think FAI is possible even if CEV were the wrong track for finding what FAI should do. In should, CEV is not necessarily part of the definition of Friendly.
Well, to assert that “an extrapolated version of mankind’s combined values can be created” doesn’t really assert much, in and of itself… just that some algorithm can be implemented that takes mankind’s values as input and generates a set of values as output. It seems pretty likely that a large number of such algorithms exist.
Of course, what CEV proponents want to say, additionally, is that some of these algorithms are such that their output is guaranteed to be something that humans ought to endorse. (Which is not to say that humans actually would endorse it.)
It’s not even clear to me that moral realists should believe this. That is, even if I posit that objective moral facts exist, it doesn’t follow that they can be derived from any algorithm applied to the contents of human minds.
But I agree with you that it’s still less clear why moral anti-realists should believe it.
If I’m a moral anti-realist, do necessarily I believe that provably Friendly AI is impossible?
No. I mean, I’m unsure about the possibility of provably Friendly AI but it’s not obvious that anti-realism makes it impossible. Moral realism, were it the case, might make things easier but it’s hard for me to imagine what that world looks like.
Let us define a morality function F() as taking as input x=the factual circumstances an agent faces in making a decision, outputting y=the decision the agent makes. It is fairly apparent that practically every agent has an F(). So ELIEZER(x) is the function that describes what Eliezer would choose in situation x. Next, define GROUP{} as the set of morality functions run by all the members of that group.
Let us define CEV() as the function that takes as input a morality function or set of morality functions and outputs a morality function that is improved/made consistent/extrapolated from the input. I’m not asserting the actual CEV formulation will do that, but it is a gesture towards the goal that CEV() is supposed to solve.
For clarity, let the output of CEV(F()) = CEV.F(). Thus, CEV.ELIEZER() is the extrapolated morality from the morality Eliezer is running. In parallel CEV.AMERICA() (which is the output of CEV(AMERICA{})) the single moral function that is the extrapolated morality of everyone in the United States. If CEV() exists, an AI considering/implementing CEV.JOHNDOE() is Friendly to John Doe. Likewise, CEV.GROUP() leads to an AI that is Friendly to every member of the group.
For FAI to be possible, CEV() must output for (A) any morality function or (B) set of morality functions. Further, for provable FAI, it must be possible to (C) mathematically show the output of CEV() before turning on the AI.
If moral realism is false, why is there reason to think (A), (B), or (C) are true?
For FAI to be possible, CEV() must output for (A) any morality function or (B) set of morality functions
Any set? Why not just require that CEV.HUMANITY() be possible? It seems like there are some sets of morality functions G that would be impossible (G={x, ~x}?). Human value is really complex so it’s a difficult thing to a)model it and b) prove the model. Obviously I don’t know how to do that; no one does yet. If moral realism were true and morality were simple and knowable I suppose that would make the job a lot easier… but that doesn’t seem like a world that is still possible. Conversely, morality could be both real and unknowable and impossibly complicated and then we’d be even in worse shape because learning about human values wouldn’t even tell us how to do Friendly AI! Maybe if you gave me some idea of what your alternative to anti-realism would look like I could answer better. In short: Friendliness is really hard, part of the reason it seems so hard to me might have to do with my moral anti-realism but I have trouble imagining plausible realist worlds where things are easier.
First, a terminology point: CEV.HUMANITYCURRENTLYALIVE() != CEV.ALLHUMANITYEVER(). For the anti-realist, CEV.HUMANITYCURRENTLYALIVE() is massively more plausible, and CEV.LONDON() is more plausible than that—but my sense is that this sentence depends on the anti-realist accepting of some flavor of moral relativism.
Second, it seems likely that fairly large groups (i.e. the population of London) already have some {P, ~P}. That’s one reason to think making CEV() is really hard.
Human value is really complex so it’s a difficult thing to a)model it and b) prove the model.
I don’t understand what proving the model means in this context.
If moral realism were true and morality were simple and knowable I suppose that would make the job a lot easier… but that doesn’t seem like a world that is still possible.
I don’t understand why you talk about possibility. “Morality is true, simple, and knowable” seems like an empirical proposition: it just turns out to be false. It isn’t obvious to me that simple moral realism is necessarily false in the way that 2+5=7 is necessarily true.
Conversely, morality could be both real and unknowable
How does the world look different if morality is real and inaccessible vs. not real?
Maybe if you gave me some idea of what your alternative to anti-realism would look like I could answer better.
Pace certain issues about human appetites as objective things, I am an anti-realist—in case that wasn’t clear.
First, a terminology point: CEV.HUMANITYCURRENTLYALIVE() != CEV.ALLHUMANITYEVER
Sure sure. But CEV.ALLHUMANITYEVER is also not the same as all CEV.ALLPOSSIBLEAGENTS.
Second, it seems likely that fairly large groups (i.e. the population of London) already have some {P, ~P}.
Some subroutines are probably inverted but there probably aren’t people with fully negated utility functions from other people. Trade-offs needn’t mean irreconcilable differences. Like I doubt there is anyone in the world who cares as much as you do about the exact opposite of everything you care about.
Human value is really complex so it’s a difficult thing to a)model it and b) prove the model.
I don’t understand what proving the model means in this context.
Show with some confidence that it doesn’t lead to terrible outcomes if implemented.
Morality is true, simple, and knowable” seems like an empirical proposition: it just turns out to be false. It isn’t obvious to me that simple moral realism is necessarily false in the way that 2+5=7 is necessarily true.
I’m not sure that it is. But when I said “still” possible I meant that we have more than enough evidence to rule out the possibility that we are living in such a world. I didn’t mean to imply any beliefs about necessity. That said I am pretty confused about what it would mean for there to be objective facts about right and wrong. Usually I think true beliefs are supposed to constrain anticipated experience. Since moral judgments don’t do that… I’m not quite sure I know what moral realism would really mean.
How does the world look different if morality is real and inaccessible vs. not real?
I imagine it wouldn’t look different but since there is no obvious way of proving a morality logically or empirically I can’t see how moral realists would be able to rule it out.
Pace certain issues about human appetites as objective things, I am an anti-realist—in case that wasn’t clear.
Oh I understand that. I just meant that when you ask:
If I’m a moral anti-realist, do necessarily I believe that provably Friendly AI is impossible?
I’m wondering “Opposed to what?”. I’m having trouble imagining the person for whom the prospects of Friendly AI are much brighter because they are a moral realist.
If I’m a moral anti-realist, do necessarily I believe that provably Friendly AI is impossible?
I’m wondering “Opposed to what?”. I’m having trouble imagining the person for whom the prospects of Friendly AI are much brighter because they are a moral realist.
It seems to me that moral realists have more reason to be optimistic about provably friendly AI than anti-realists. The steps to completion are relatively straightforward: (1) Rigorously describe the moral truths that make up the true morality. (2) Build an AGI that maximizes what the true morality says to maximize.
I’m not quite sure I know what moral realism would really mean.
I think Alice, a unitary moral realist, believes she is justified in saying: “Anyone whose morality function does not output Q in situation q is a defective human, roughly analogous to the way any human who never feels hungry is defective in some way.”
Bob, a pluralist moral realist, would say: “Anyone whose morality function does not output from the set {Q1, Q2, Q3} in situation q is a defective human.”
Charlie, a moral anti-realist, would say Alice and Bob’s statements are both misleading, being historically contingent, or incapable of being evaluated for truth, or some other problem.
Consider the following statement:
“Every (moral) decision a human will face has a single choice that is most consistent with human nature.”
To me, that position implies that moral realism is true. If you disagree, could you explain why?
I imagine it wouldn’t look different [if morality is real and inaccessible vs. not real] but since there is no obvious way of proving a morality logically or empirically I can’t see how moral realists would be able to rule it out.
What is at stake in the distinction? A set of facts that cannot have causal effect might as well not exist. Compare error theorists to inaccessibility moral realists—the former say value statements cannot be evaluated for truth, the latter say value statements could be true, but in principle, we will never know. For any actual problem, both schools of thought recommend the same stance, right?
moral realists have more reason to be optimistic about provably friendly AI than anti-realists. The steps to completion are relatively straightforward: (1) Rigorously describe the moral truths that make up the true morality. (2) Build an AGI that maximizes what the true morality says to maximize.
Is step 1 even necessary? Presumably in that universe one could just build an AGI that was smart enough to infer those moral truths and implement them, and turn it on secure in the knowledge that even if it immediately started disassembling all available matter to make prime-numbered piles of paperclips, it would be doing the right thing. No?
That’s an interesting point. I suppose it depends on whether a moral realist can think something can be morally right for one class of agents and morally wrong for another class. I think such a position is consistent with moral realism. If that is a moral realist position, then the AI programmer should be worried that an unconstrained AI would naturally develop a morality function different than CEV.HUMANITY().
In other words, when we say moral realist, are we using a two part word with unfortunate ambiguity between realism(morality, agent) and realism(morality, humans)? Wow, I never considered whether this was part of the inferential distance in these types of discussions.
Well, to start with, I would say that CEV is beside the point here. In a universe where there exist moral truths that make up the true morality, if what I want is to do the right thing, there’s no particular reason for me to care about anyone’s volition, extrapolated or otherwise. What I ought to care about is discerning those moral truths. Maybe I can discern them by analyzing human psychology, maybe by analyzing the human genome, maybe by analyzing the physical structure of carbon atoms, maybe by analyzing the formal properties of certain kinds of computations, I dunno… but whatever lets me figure out those moral truths, that is what I ought to be attending to in such a universe, and if humanity’s volition conflicts with those truths, so much the worse for humanity.
So the fact that an unconstrained AI might—or even is guaranteed to—develop a morality function different than CEV.HUMANITY() is not, in that universe, a reason not to build an unconstrained AI. (Well, not a moral reason, anyway. I can certainly choose to forego doing the right thing in that universe if it turns out to be something I personally dislike, but only at the cost of behaving immorally.)
But that’s beside your main point, that even in that universe the moral truths of the universe might be such that different behaviors are most right for different agents. I agree with this completely. Another way of saying it is that total rightness is potentially maximized when different agents are doing (specific) different things. (This might be true in a non-moral-realist universe as well.)
Actually, it may be useful here to be explicit about what we think a moral truth is in that universe. That is, is it a fact about the correct state of the world? Is it a fact about the correct behavior of an agent in a given situation, independent of consequences? Is it a fact about the correct way to be, regardless of behavior or consequences? Is it something else?
If I’m a moral anti-realist, do necessarily I believe that provably Friendly AI is impossible? When defining friendly, consider Archimedes’ Chronophone, which suggests that friendly AI would (should?) be friendly to just about any human who ever lived.
moral anti-realism—there are no (or insufficient) moral facts to resolve all moral disputes an agent faces.
People can hold different moral views. Sometimes these views are opposed and any compromise would be called immoral by at least one of them. Any AI that enforced such a compromise, would be called unFriendly by at least one of them.
Even for a moral realist (and I don’t think well of that position), the above remains true, because people demonstrably have irreconcilably different moral views. If you’re a moral realist, you have the choice of:
Implement objective moral truth however defined, and ignore everyone’s actual moral feelings. In which case FAI is irrelevant—if the moral truth tells you to be unFriendly, you do it.
Implement some pre-chosen function—your own morals, or many people’s morals like CEV, or some other thing that does not depend on moral truth.
If you’re a moral anti-realist, you can only choose 2, because no moral truth exists. That’s the only difference stemming from being a moral realist or anti-realist.
Does this mean that Friendly-to-everyone AI is impossible in moral anti-realism? Certainly, because people have fundamental moral disagreements. But moral realism doesn’t help! It just adds the option of following some “moral facts” which some or all humans disagree with, which is no better in terms of Friendliness than existing options. (If all humans agreed with some set of purported moral facts, people wouldn’t have needed to invent the concept of moral facts in the first place.)
The existence of moral disagreement, standing alone, is not enough to show moral realism is false. After all, scientific disagreement doesn’t show physical realism is false.
Further, I am confused by your portrayal of moral realists. Presumably, the reality of moral facts would show that people acting contrary to those facts were making a mistake, much like people who thought “Objects in motion will tend to come to a stop” were making a mistake. It seems strange to call correcting that mistake “ignoring everyone’s actual scientific feelings.” Likewise, if I am unknowingly doing wrong, and you can prove it, I would not view that correction as ignoring my moral feelings—I want to do right, not just think I am doing right.
In short, I think that the position you are labeling “moral realist” is just a very confused version of moral anti-realism. Moral realists can and should reject that idea that the mere existence at any particular moment of moral disagreement is useful evidence of whether there is one right answer. In other words, a distinction should be made between the existence of moral disagreement and the long-term persistence of moral disagreement.
I didn’t say that it was. Rather I pointed out the difference between morality and Friendliness.
For an AI to be able to be Friendly towards everyone requires not moral realism, but “friendliness realism”—which is basically the idea that a single behavior of the AI can satisfy everyone. This is clearly false if “everyone” means “all intelligences including aliens, other AIs, etc.” It may be true if we restrict ourselves to “all humans” (and stop humans from diversifying too much, and don’t include hypothetical or far-past humans).
I, personally, believe the burden of proof is on those who believe this to be possible to demonstrate it. My prior for “all humans” says they are a very diverse and selfish bunch and not going to be satisfied by any one arrangement of the universe.
Regardless, moral realism and friendliness realism are different. If you built an objectively moral but unFriendly AI, that’s the scenario I discussed in my previous comment—and people would be unhappy. OTOH, if you think a Friendly AI is by logical necessity a moral one (under moral realism), that’s a very strong claim about objective morals—a claim that people would perceive an AI implementing objective morals as Friendly. This is a far stronger claim than that people who are sufficiently educated and exposed to the right knowledge will come to agree with certain universal objective morals. A Friendly AI means one that is Friendly to people as they really are, here and now. (As I said, to me it seems very likely that an AI cannot in fact be Friendly to everyone at once.)
I think we are simply having a definitional dispute. As the term is used generally, moral realism doesn’t mean that each agent has a morality, but that there are facts about morality that are external to the agent (i.e. objective). Now, “objective” is not identical to “universal,” but in practice, objective facts tend to cause convergence of beliefs. So I think what I am calling “moral realism” is something like what you are calling “Friendliness realism.”
Lengthening the inferential distance further is that realism is a two place word. As you noted, there is a distinction between realism(Friendliness, agents) and realism(Friendliness, humans).
That said, I do think that “people would perceive an AI implementing objective morals as Friendly” if I believed that objective morals exist. I’m not sure why you think that’s a stronger claim than “people who are sufficiently educated and exposed to the right knowledge will come to agree with certain universal objective morals.” If you believed that there were objective moral facts and knew the content of those facts, wouldn’t you try to adjust your beliefs and actions to conform to those facts, in the same way that you would adjust your physical-world beliefs to conform to objective physical facts?
That seems likely. If moral realists think the morality is a one-place word, and anti realists think it’s a two place word, we would be better served by using two distinct words.
It is somewhat unclear to me what moral realists are thinking of, or claiming, about whatever it is they call morality. (Even after taking into account that different people identified as moral realists do not all agree on the subject.)
I defined ‘Friendliness (to X)’ as ‘behaving towards X in the way that is best for X in some implied sense’. Obviously there is no Friendliness towards everyone, but there might be Friendliness towards humans: then “Friendliness realism” (my coining) is the belief that there is a single Friendly-towards-humans behavior that will in fact be Friendly towards all humans. Whereas Friendliness anti-realism is the belief no one behavior would satisfy all humans, and it would inevitably be unFriendly towards some of them.
Clearly this discussion assumes many givens. Most importantly, 1) what exactly counts as being Friendly towards someone (are we utilitarian? what kind? must we agree with the target human as to what is Friendly towards them? If we influence them to come to like us, when is that allowed?). 2) what is the set of ‘all humans’? Do past, distant, future expected, or entirely hypothetical people count? What is the value of creating new people? Etc.
My position is that: 1) for most common assumed answers to these questions, I am a “Friendliness anti-realist”; I do not believe any one behavior by a superpowerful universe-optimizing AI would count as Friendliness towards all humans at once. And 2), inasfar as I have seen moral realism explained, it seems to me to be incompatible with Friendliness realism. But it’s possible some people mean something entirely different by “morals” and by “moral realism” than what I’ve read.
That’s a tautology: yes I would. But, the assumption is not valid.
Even if you assume there exist objective moral facts (whatever you take that to mean), it does not follow that you would be able to convince other people that they are true moral facts! I believe it is extremely likely you would not be able to convince people—just as today most people in the world seem to be moral realists (mostly religious), and yet hold widely differing moral beliefs and when they convert to another set of beliefs it is almost never due to some sort of rational convincing.
It would be nice to live in a world where you could start from the premise that “people believe that there are objective moral facts and know the content of those facts”. But in practice we, and any future FAI, will live in a world where most people will reject mere verbal arguments in favor of new morals contradicting their current ones.
No. FAI is supposed to implement an extrapolated version of mankind’s combined values, not search for an objectively defined moral code to implement.
Also: Eliezer has argued that even from it’s programmers’ perspective, some elements of a FAI’s moral code (Coherent Extrapolated Volition) will probably look deeply immoral. (But will actually be OK.)
Why does the moral anti-realist think “an extrapolated version of mankind’s combined values” exists or is capable of being created? For the moral realists, the answer is easy—the existence of objective moral facts shows that, in principle, some moral system that all humans could endorse could be discovered/articulated.
As an aside, CEV is a proposed method for finding what an FAI would implement. I think that one could think FAI is possible even if CEV were the wrong track for finding what FAI should do. In should, CEV is not necessarily part of the definition of Friendly.
Well, to assert that “an extrapolated version of mankind’s combined values can be created” doesn’t really assert much, in and of itself… just that some algorithm can be implemented that takes mankind’s values as input and generates a set of values as output. It seems pretty likely that a large number of such algorithms exist.
Of course, what CEV proponents want to say, additionally, is that some of these algorithms are such that their output is guaranteed to be something that humans ought to endorse. (Which is not to say that humans actually would endorse it.)
It’s not even clear to me that moral realists should believe this. That is, even if I posit that objective moral facts exist, it doesn’t follow that they can be derived from any algorithm applied to the contents of human minds.
But I agree with you that it’s still less clear why moral anti-realists should believe it.
No. I mean, I’m unsure about the possibility of provably Friendly AI but it’s not obvious that anti-realism makes it impossible. Moral realism, were it the case, might make things easier but it’s hard for me to imagine what that world looks like.
Let us define a morality function F() as taking as input x=the factual circumstances an agent faces in making a decision, outputting y=the decision the agent makes. It is fairly apparent that practically every agent has an F(). So ELIEZER(x) is the function that describes what Eliezer would choose in situation x. Next, define GROUP{} as the set of morality functions run by all the members of that group.
Let us define CEV() as the function that takes as input a morality function or set of morality functions and outputs a morality function that is improved/made consistent/extrapolated from the input. I’m not asserting the actual CEV formulation will do that, but it is a gesture towards the goal that CEV() is supposed to solve.
For clarity, let the output of CEV(F()) = CEV.F(). Thus, CEV.ELIEZER() is the extrapolated morality from the morality Eliezer is running. In parallel CEV.AMERICA() (which is the output of CEV(AMERICA{})) the single moral function that is the extrapolated morality of everyone in the United States. If CEV() exists, an AI considering/implementing CEV.JOHNDOE() is Friendly to John Doe. Likewise, CEV.GROUP() leads to an AI that is Friendly to every member of the group.
For FAI to be possible, CEV() must output for (A) any morality function or (B) set of morality functions. Further, for provable FAI, it must be possible to (C) mathematically show the output of CEV() before turning on the AI.
If moral realism is false, why is there reason to think (A), (B), or (C) are true?
Any set? Why not just require that CEV.HUMANITY() be possible? It seems like there are some sets of morality functions G that would be impossible (G={x, ~x}?). Human value is really complex so it’s a difficult thing to a)model it and b) prove the model. Obviously I don’t know how to do that; no one does yet. If moral realism were true and morality were simple and knowable I suppose that would make the job a lot easier… but that doesn’t seem like a world that is still possible. Conversely, morality could be both real and unknowable and impossibly complicated and then we’d be even in worse shape because learning about human values wouldn’t even tell us how to do Friendly AI! Maybe if you gave me some idea of what your alternative to anti-realism would look like I could answer better. In short: Friendliness is really hard, part of the reason it seems so hard to me might have to do with my moral anti-realism but I have trouble imagining plausible realist worlds where things are easier.
First, a terminology point: CEV.HUMANITYCURRENTLYALIVE() != CEV.ALLHUMANITYEVER(). For the anti-realist, CEV.HUMANITYCURRENTLYALIVE() is massively more plausible, and CEV.LONDON() is more plausible than that—but my sense is that this sentence depends on the anti-realist accepting of some flavor of moral relativism.
Second, it seems likely that fairly large groups (i.e. the population of London) already have some {P, ~P}. That’s one reason to think making CEV() is really hard.
I don’t understand what proving the model means in this context.
I don’t understand why you talk about possibility. “Morality is true, simple, and knowable” seems like an empirical proposition: it just turns out to be false. It isn’t obvious to me that simple moral realism is necessarily false in the way that 2+5=7 is necessarily true.
How does the world look different if morality is real and inaccessible vs. not real?
Pace certain issues about human appetites as objective things, I am an anti-realist—in case that wasn’t clear.
Sure sure. But CEV.ALLHUMANITYEVER is also not the same as all CEV.ALLPOSSIBLEAGENTS.
Some subroutines are probably inverted but there probably aren’t people with fully negated utility functions from other people. Trade-offs needn’t mean irreconcilable differences. Like I doubt there is anyone in the world who cares as much as you do about the exact opposite of everything you care about.
Show with some confidence that it doesn’t lead to terrible outcomes if implemented.
I’m not sure that it is. But when I said “still” possible I meant that we have more than enough evidence to rule out the possibility that we are living in such a world. I didn’t mean to imply any beliefs about necessity. That said I am pretty confused about what it would mean for there to be objective facts about right and wrong. Usually I think true beliefs are supposed to constrain anticipated experience. Since moral judgments don’t do that… I’m not quite sure I know what moral realism would really mean.
I imagine it wouldn’t look different but since there is no obvious way of proving a morality logically or empirically I can’t see how moral realists would be able to rule it out.
Oh I understand that. I just meant that when you ask:
I’m wondering “Opposed to what?”. I’m having trouble imagining the person for whom the prospects of Friendly AI are much brighter because they are a moral realist.
It seems to me that moral realists have more reason to be optimistic about provably friendly AI than anti-realists. The steps to completion are relatively straightforward: (1) Rigorously describe the moral truths that make up the true morality. (2) Build an AGI that maximizes what the true morality says to maximize.
I think Alice, a unitary moral realist, believes she is justified in saying: “Anyone whose morality function does not output Q in situation q is a defective human, roughly analogous to the way any human who never feels hungry is defective in some way.”
Bob, a pluralist moral realist, would say: “Anyone whose morality function does not output from the set {Q1, Q2, Q3} in situation q is a defective human.”
Charlie, a moral anti-realist, would say Alice and Bob’s statements are both misleading, being historically contingent, or incapable of being evaluated for truth, or some other problem.
Consider the following statement:
“Every (moral) decision a human will face has a single choice that is most consistent with human nature.”
To me, that position implies that moral realism is true. If you disagree, could you explain why?
What is at stake in the distinction? A set of facts that cannot have causal effect might as well not exist. Compare error theorists to inaccessibility moral realists—the former say value statements cannot be evaluated for truth, the latter say value statements could be true, but in principle, we will never know. For any actual problem, both schools of thought recommend the same stance, right?
Is step 1 even necessary? Presumably in that universe one could just build an AGI that was smart enough to infer those moral truths and implement them, and turn it on secure in the knowledge that even if it immediately started disassembling all available matter to make prime-numbered piles of paperclips, it would be doing the right thing. No?
That’s an interesting point. I suppose it depends on whether a moral realist can think something can be morally right for one class of agents and morally wrong for another class. I think such a position is consistent with moral realism. If that is a moral realist position, then the AI programmer should be worried that an unconstrained AI would naturally develop a morality function different than CEV.HUMANITY().
In other words, when we say moral realist, are we using a two part word with unfortunate ambiguity between realism(morality, agent) and realism(morality, humans)? Wow, I never considered whether this was part of the inferential distance in these types of discussions.
Well, to start with, I would say that CEV is beside the point here. In a universe where there exist moral truths that make up the true morality, if what I want is to do the right thing, there’s no particular reason for me to care about anyone’s volition, extrapolated or otherwise. What I ought to care about is discerning those moral truths. Maybe I can discern them by analyzing human psychology, maybe by analyzing the human genome, maybe by analyzing the physical structure of carbon atoms, maybe by analyzing the formal properties of certain kinds of computations, I dunno… but whatever lets me figure out those moral truths, that is what I ought to be attending to in such a universe, and if humanity’s volition conflicts with those truths, so much the worse for humanity.
So the fact that an unconstrained AI might—or even is guaranteed to—develop a morality function different than CEV.HUMANITY() is not, in that universe, a reason not to build an unconstrained AI. (Well, not a moral reason, anyway. I can certainly choose to forego doing the right thing in that universe if it turns out to be something I personally dislike, but only at the cost of behaving immorally.)
But that’s beside your main point, that even in that universe the moral truths of the universe might be such that different behaviors are most right for different agents. I agree with this completely. Another way of saying it is that total rightness is potentially maximized when different agents are doing (specific) different things. (This might be true in a non-moral-realist universe as well.)
Actually, it may be useful here to be explicit about what we think a moral truth is in that universe. That is, is it a fact about the correct state of the world? Is it a fact about the correct behavior of an agent in a given situation, independent of consequences? Is it a fact about the correct way to be, regardless of behavior or consequences? Is it something else?