Fake Utility Functions

Every now and then, you run across someone who has discovered the One Great Moral Principle, of which all other values are a mere derivative consequence.

I run across more of these people than you do. Only in my case, it’s people who know the amazingly simple utility function that is all you need to program into an artificial superintelligence and then everything will turn out fine.

(This post should come as an anticlimax, since you already know virtually all the concepts involved, I bloody well hope. See yesterday’s post, and all my posts since October 31st, actually...)

Some people, when they encounter the how-to-program-a-superintelligence problem, try to solve the problem immediately. Norman R. F. Maier: “Do not propose solutions until the problem has been discussed as thoroughly as possible without suggesting any.” Robyn Dawes: “I have often used this edict with groups I have led—particularly when they face a very tough problem, which is when group members are most apt to propose solutions immediately.” Friendly AI is an extremely tough problem so people solve it extremely fast.

There’s several major classes of fast wrong solutions I’ve observed; and one of these is the Incredibly Simple Utility Function That Is All A Superintelligence Needs For Everything To Work Out Just Fine.

I may have contributed to this problem with a really poor choice of phrasing, years ago when I first started talking about “Friendly AI”. I referred to the optimization criterion of an optimization process—the region into which an agent tries to steer the future—as the “supergoal”. I’d meant “super” in the sense of “parent”, the source of a directed link in an acyclic graph. But it seems the effect of my phrasing was to send some people into happy death spirals as they tried to imagine the Superest Goal Ever, the Goal That Overrides All Over Goals, the Single Ultimate Rule From Which All Ethics Can Be Derived.

But a utility function doesn’t have to be simple. It can contain an arbitrary number of terms. We have every reason to believe that insofar as humans can said to be have values, there are lots of them—high Kolmogorov complexity. A human brain implements a thousand shards of desire, though this fact may not be appreciated by one who has not studied evolutionary psychology. (Try to explain this without a full, long introduction, and the one hears “humans are trying to maximize fitness”, which is exactly the opposite of what evolutionary psychology says.)

So far as descriptive theories of morality are concerned, the complicatedness of human morality is a known fact. It is a descriptive fact about human beings, that the love of a parent for a child, and the love of a child for a parent, and the love of a man for a woman, and the love of a woman for a man, have not been cognitively derived from each other or from any other value. A mother doesn’t have to do complicated moral philosophy to love her daughter, nor extrapolate the consequences to some other desideratum. There are many such shards of desire, all different values.

Leave out just one of these values from a superintelligence, and even if you successfully include every other value, you could end up with a hyperexistential catastrophe, a fate worse than death. If there’s a superintelligence that wants everything for us that we want for ourselves, except the human values relating to controlling your own life and achieving your own goals, that’s one of the oldest dystopias in the book. (Jack Williamson’s “With Folded Hands”, in this case.)

So how does the one constructing the Amazingly Simple Utility Function deal with this objection?

Objection? Objection? Why would they be searching for possible objections to their lovely theory? (Note that the process of searching for real, fatal objections isn’t the same as performing a dutiful search that amazingly hits on only questions to which they have a snappy answer.) They don’t know any of this stuff. They aren’t thinking about burdens of proof. They don’t know the problem is difficult. They heard the word “supergoal” and went off in a happy death spiral around “complexity” or whatever.

Press them on some particular point, like the love a mother has for her children, and they reply “But if the superintelligence wants ‘complexity’, it will see how complicated the parent-child relationship is, and therefore encourage mothers to love their children.” Goodness, where do I start?

Begin with the motivated stopping: A superintelligence actually searching for ways to maximize complexity wouldn’t conveniently stop if it noticed that a parent-child relation was complex. It would ask if anything else was more complex. This is a fake justification; the one trying to argue the imaginary superintelligence into a policy selection, didn’t really arrive at that policy proposal by carrying out a pure search for ways to maximize complexity.

The whole argument is a fake morality. If what you really valued was complexity, then you would be justifying the parental-love drive by pointing to how it increases complexity. If you justify a complexity drive by alleging that it increases parental love, it means that what you really value is the parental love. It’s like giving a prosocial argument in favor of selfishness.

But if you consider the affective death spiral, then it doesn’t increase the perceived niceness of “complexity” to say “A mother’s relationship to her daughter is only important because it increases complexity; consider that if the relationship became simpler, we would not value it.” What does increase the perceived niceness of “complexity” is saying, “If you set out to increase complexity, mothers will love their daughters—look at the positive consequence this has!”

This point applies whenever you run across a moralist who tries to convince you that their One Great Idea is all that anyone needs for moral judgment, and proves this by saying, “Look at all these positive consequences of this Great Thingy”, rather than saying, “Look at how all these things we think of as ‘positive’ are only positive when their consequence is to increase the Great Thingy.” The latter being what you’d actually need to carry such an argument.

But if you’re trying to persuade others (or yourself) of your theory that the One Great Idea is “bananas”, you’ll sell a lot more bananas by arguing how bananas lead to better sex, rather than claiming that you should only want sex when it leads to bananas.

Unless you’re so far gone into the Happy Death Spiral that you really do start saying “Sex is only good when it leads to bananas.” Then you’re in trouble. But at least you won’t convince anyone else.

In the end, the only process that reliably regenerates all the local decisions you would make given your morality, is your morality. Anything else—any attempt to substitute instrumental means for terminal ends—ends up losing purpose and requiring an infinite number of patches because the system doesn’t contain the source of the instructions you’re giving it. You shouldn’t expect to be able to compress a human morality down to a simple utility function, any more than you should expect to compress a large computer file down to 10 bits.

Addendum: Please note that we’re not yet ready to discuss Friendly AI, as such, on Overcoming Bias. That will require a lot more prerequisite material. This post is only about why simple utility functions fail to compress our values.