Friendly to who?

At
http://​​lesswrong.com/​​lw/​​ru/​​the_bedrock_of_fairness/​​ldy
Eliezer mentions two challenges he often gets, “Friendly to who?” and “Oh, so you get to say what ‘Friendly’ means.” At the moment I see only one true answer to these questions, which I give below. If you can propose alternatives in the comments, please do.

I suspect morality is in practice a multiplayer game, so talking about it needs multiple people to be involved. Therefore, let’s imagine a dialogue between A and B.

A: Okay, so you’re interested in Friendly AI. Who will it be Friendly toward?

B: Obviously the people who participate in making the system will decide how to program it, so they will decide who it is Friendly toward.

A: So the people who make the system decide what “Friendly” means?

B: Yes.

A: Then they could decide that it will be Friendly only toward them, or toward White people. Aren’t that sort of selfishness or racism immoral?

B: I can try to answer questions about the world, so if you can define morality so I can do experiments to discover what is moral and what is immoral, I can try to guess the results of those experiments and report them. What do you mean by morality?

A: I don’t know. If it doesn’t mean anything, why do people talk about morality so much?

B: People often profess beliefs to label themselves as members of a group. So far as I can tell, the belief that some things are moral and other things are not is one of those beliefs. I don’t have any other explanation for why people talk so much about something that isn’t subject to experimentation.

A: So if that’s what morality is, then it’s fundamentally meaningless unless I’m planning out what lies to tell in order to get positive regard from a potential ingroup, or better yet I manage to somehow deceive myself so I can truthfully conform to the consensus morality of my desired ingroup. If that’s all it is, there’s no constraint on how a Friendly AI works, right? Maybe you’ll build it and it will be only be Friendly toward B.

B: No, because I can’t do it by myself. Suppose I approach you and say “I’m going to make a Friendly AI that lets me control it and doesn’t care about anyone else’s preference.” Would you help me?

A: Obviously not.

B: Nobody else would either, so the only way I can unilaterally run the world with an FAI is to create it by myself, and I’m not up to that. There are a few other proposed notions of Friendlyness that are nonviable for similar reasons. For example, if I approached you and said “I’m going to make a Friendly AI that treats everyone fairly, but I don’t want to let anybody inspect how it works.” Would you help me?

A: No, because I wouldn’t trust you. I’d assume that you plan to really make it Friendly only toward yourself, lie about it, and then drop the lie once the FAI had enough power that you didn’t need the lie any more.

B: Right. Here’s an ethical system that fails another way: “I’ll make an FAI that cares about every human equally, no matter what they do.” To keep it simple, let’s assume that engineering humans to have strange desires for the purpose of manipulating the FAI is not possible. Would you help me build that?

A: Well, it fits with my intuitive notion of morality, but it’s not clear what incentive I have to help. If you succeed, I seem to win equally at the end whether I help you or not. Why bother?

B: Right. There are several possible fixes for that. Perhaps if I don’t get your help, I won’t succeed, and the alternative is that someone else builds it poorly and your quality of life decreases dramatically. That gives you an incentive to help.

A: Not much of one. You’ll surely need a lot of help, and maybe if all those other people help I won’t have to. Everyone would make the same decision and nobody would help.

B: Right. I could solve that problem by paying helpers like you money, if I had enough money. Another option would be to tilt the Friendlyness in the direction of helpers in proportion to how much they help me.

A: But isn’t tilting the Friendlyness unfair?

B: Depends. Do you want things to be fair?

A: Yes, for some intuitive notion of “fairness” I can’t easily describe.

B: So if the AI cares what you want, that will cause it to figure out what you mean by “fair” and tend to make it happen, with that tendency increasing as it tilts more in your favor, right?

A: I suppose so. No matter what I want, if the AI cares enough about me, it will give me more of what I want, including fairness.

B: Yes, that’s the best idea I have right now. Here’s another alternative: What would happen if we only took action when there’s a consensus about how to weight the fairness?

A: Well, 4% of the population are sociopaths. They, and perhaps others, would make ridiculous demands and prevent any consensus. Then we’d be waiting forever to build this thing and someone else who doesn’t care about consensus would move while we’re dithering and make us irrelevant. Thus we’ll have to take action and do something reasonable without having a consensus about what that is. Since we can’t wait for a consensus, maybe it makes sense to proceed now. So how about it? Do you need help yet?

B: Nope, I don’t know how to make it.

A: Damn. Hmm, do you think you’ll figure it out before everybody else?

B: Probably not. There are a lot of everybody else. In particular, business organizations that optimize for profit have a lot of power and have fundamentally inhuman value systems. I don’t see how I can take action before all of them.

A: Me either. We are so screwed.