Human values differ as much as values can differ

George Hamilton’s autobiography Don’t Mind if I Do, and the very similar book by Bob Evans, The Kid Stays in the Picture, give a lot of insight into human nature and values. For instance: What do people really want? When people have the money and fame to travel around the world and do anything that they want, what do they do? And what is it that they value most about the experience afterward?

You may argue that the extremely wealthy and famous don’t represent the desires of ordinary humans. I say the opposite: Non-wealthy, non-famous people, being more constrained by need and by social convention, and having no hope of ever attaining their desires, don’t represent, or even allow themselves to acknowledge, the actual desires of humans.

I noticed a pattern in these books: The men in them value social status primarily as an ends to a means; while the women value social status as an end in itself.

“Male” and “female” values

This is a generalization; but, at least at the very upper levels of society depicted in these books, and a few others like them that I’ve read, it’s frequently borne out. (Perhaps a culture chooses celebrities who reinforce its stereotypes.) Women and men alike appreciate expensive cars and clothing. But the impression I get is that the flamboyantly extravagant are surprisingly non-materialistic. Other than food (and, oddly, clothing), the very wealthy themselves consistently refer to these trappings as things that they need in order to signal their importance to other people. They don’t have an opinion on how long or how tall a yacht “ought” to be; they just want theirs to be the longest or tallest. The persistent phenomenon whereby the more wealthy someone appears, the more likely they are to go into debt, is not because these people are too stupid or impulsive to hold on to their money (as in popular depictions of the wealthy, eg., A New Leaf) . It’s because they are deliberately trading monetary capital for the social capital that they actually desire (and expect to be able to trade it back later if they wish to, even making a profit on the “transaction”, as Donald Trump has done so well).

With most of the women in these books, that’s where it ends. What they want is to be the center of attention. They want to walk into a famous night-club and see everyone’s heads turn. They want the papers to talk about them. They want to be able to check into a famous hotel at 3 in the morning and demand that the head chef be called at home, woken up, and brought in immediately to cook them a five-course meal. Some of the women in these stories, like Elizabeth Taylor, routinely make outrageous demands just to prove that they’re more important than other people.

What the men want is women. Quantity and quality. They like social status, and they like to butt heads with other men and beat them; but once they’ve acquired a bevy of beautiful women, they are often happy to retire to their mansion or yacht and enjoy them in private for a while. And they’re capable of forming deep, private attachments to things, in a way the women are less likely to. A man can obsess over his collection of antique cars as beautiful things in and of themselves. A woman will not enjoy her collection of Faberge eggs unless she has someone to show it to. (Preferably someone with a slightly less-impressive collection of Faberge eggs.) Reclusive celebrities are more likely to be men than women.

Some people mostly like having things. Some people mostly like having status. Do you see the key game-theoretic distinction?

Neither value is very amenable to the creation of wealth. Give everybody a Rolls-Royce; and the women still have the same social status, and the men don’t have any more women. But the “male” value is more amenable to it. Men compete, but perhaps mainly because the distribution of quality of women is normal. The status-related desires of the men described above are, in theory, capable of being mutually satisfied. The women’s are not.

Non-positional /​ Mutually-satisfiable vs. Positional /​ Non-mutually-satisfiable values

No real person implements pure mutually-satisfiable or non-mutually-satisfiable values. I have not done a study or taken a survey, and don’t claim that these views correlate with sex in general. I just wanted to make accessible the evidence I saw that these two types of values exist in humans. The male/​female distinction isn’t what I want to talk about; it just helped organize the data in a way that made this distinction pop out for me. I could also have told a story about how men and women play sports, and claim that men are more likely to want to win (a non-mutually-satisfiable value), and women are more likely to just want to have fun (a mutually-satisfiable value). Let’s not get distracted by sexual politics. I’m not trying to say something about women or about men; I’m trying to say something about FAI.

I will now rename them “non-positional” and “positional” (as suggested by SilasBarta and wnoise), where “non-positional” means assigning a value to something from category X according to its properties, and “positional” means assigning a new value to something from category X according to the rank of its non-positional value in the set of all X (non-mutually-satisfiable).

Now imagine two friendly AIs, one non-positional and one positional.

The non-positional FAI has a tough task. It wants to give everyone what it imagines they want.

But the positional FAI has an impossible task. It wants to give everyone what it is that it thinks they value, which is to be considered better than other people, or at least better than other people of the same sex. But it’s a zero-sum value. It’s very hard to give more status to one person without taking the same amount of status away from other people. There might be some clever solution involving sending people on trips at relativistic speeds so that the time each person is high-status seems longer to them than the time they are low-status, or using drugs to heighten their perceptions of high status and diminish the pain of low status. For an average utilitarian, the best solution is probably to kill off everyone except one man and one woman. (Painlessly, of course.)

A FAI trying to satisfy one of these preferences would take society in a completely different direction than a FAI trying to satisfy the other. From the perspective of someone with the job of trying to satisfy these preferences for everyone, they are as different as it is possible for preferences to be, even though they are taken (in the books mentioned above) from members of the same species at the same time in the same place in the same strata of the same profession.

Correcting value “mistakes” is not Friendly

This is not a problem that can be resolved by popping up a level. If you say, “But what people who want status REALLY want is something else that they can use status to obtain,” you’re just denying the existence of status as a value. It’s a value. When given the chance to either use their status to attain something else, or keep pressing the lever that gives them a “You’ve got status!” hit, some people choose to keep pressing the lever.

If you claim that these people have formed bad habits, and improperly short-circuited a connection from value to stimulus; and can be re-educated to instead see status as a means, rather than as an ends… I might agree with you. But you’d make a bad, unfriendly AI. If there’s one thing FAIers have been clear about, it’s that changing top-level goals is not allowed. (That’s usually said with respect to the FAI’s top-level goals, not wrt the human top-level goals. But, since the FAI’s top-level goal is just to preserve human top-level goals, it would be pointless to make a lot of fuss making sure the FAI held its own top-level goals constant, if you’re going to “correct” human goals first.)

If changing top-level goals is allowed in this instance, or this top-level goal is considered “not really a top-level goal”, I would become alarmed and demand an explanation of how a FAI distinguishes such pseudo-top-level-goals from real top-level goals.

If a computation can be conscious, then changing a conscious agent’s computation changes its conscious experience

If you believe that computer programs can be conscious, then unless you have a new philosophical position that you haven’t told anyone about, you believe that consciousness can be a by-product of computation. This means that the formal, computational properties of peoples’ values are not just critical, they’re the only thing that matters. This means that there is no way to abstract away the bad property of being zero-sum from a value without destroying the value.

In other words, it isn’t valid to analyze the sensations that people get when their higher status is affirmed by others, and then recreate those sensations directly in everyone, without anyone needing to have low status. If you did that, I can think of only 3 possible interpretations of what you would have done, and I find none of them acceptable:

  • Consciousness is not dependent on computational structure (this leads to vitalism); or

  • You have changed the computational structure their behaviors and values are part of, and therefore changed their conscious experience and their values; or

  • You have embedded them each within their own Matrix, in which they perceive themselves as performing isomorophic computations (e.g., the “Build human-seeming robots” or “For every person, a volcano-lair” approaches mentioned in the comments).

Summary

This discussion has uncovered several problems for an AI trying to give people what they value without changing what they value. In increasing order of importance:

  • If you have a value associated with a sensation that is caused by a stimulus, it isn’t clear when it’s legitimate for a FAI to reconnect the sensation to a different stimulus and claim it’s preserved the value. Maybe it’s morally okay for a person to rewire their kids to switch their taste-perceptions of broccoli and ice cream. But is an AI still friendly if it does this?

  • It isn’t okay to do this with the valuation of social status. Social status has a simple formal (mathematical) structure requiring some agents to have low status in order for others to have high status. The headache that status poses for a FAI trying to satisfy it is a result of this formal structure. You can’t abstract it away, and you can’t legitimately banish it by reconnecting a sensation associated with it to a different stimulus, because the agent would then use that sensation to drive different behavior, meaning the value is now part of a different computational structure, and a different conscious experience. You either preserve the problematic formal structure, or you throw out the value.

  • Some top-level human goals lead to conflict. You can’t both eliminate conflict, and preserve human values. It’s irresponsible, as well as creepy, when some people (I’m referring to some comments made on LW that I can’t find now) talk about Friendly AI the same way that Christians talk about the Second Coming, as a future reign of perfect happiness for all when the lamb will lie down with the lion. That is a powerful attractor that you don’t want to go near, unless you are practicing the Dark Arts.

  • The notion of top-level goal is clear only in a 1960s classic symbolic AI framework. The idea of a “top-level goal” is an example of what I called the “Prime mover” theory of network concepts. In humans, a subsidiary goal, like status, can become a top-level goal via classic behavioristic association. It happens all the time. But the “preferences” that the FAI is supposed to preserve are human top-level goals. How’s it supposed to know which top-level goals are sacrosanct, and which ones are just heuristics or erroneous associations?

  • Reconciling human values may not be much easier or more sensible than reconciling all values, because human values already differ as much as it is possible for values to differ. Sure, humans have only covered a tiny portion of the space of possible values. But we’ve just seen two human values that differ along the critical dimensions of being mutually satisfiable or not being mutually satisfiable, and of encouraging global cooperation vs. not encouraging global cooperation. The harmonic series looks a lot like Zeno’s geometric series; yet one converges and one diverges. It doesn’t matter that the terms used in each look similar; they’re as different as series can be. In the same way, values taken from any conceivable society of agents can be classified into mutually-satisfiable, or not mutually-satisfiable. For the purposes of a Friendly AI, a mutually-satisfiable value held by gas clouds in Antares is more similar to a mutually-satisfiable human value, than either is to a non-mutually-satisfiable human value.