Humans don’t generally have utility functions

First, the background:

Humans obviously have a variety of heuristics and biases that lead to non-optimal behavior. But can this behavior truly not be described by a function?

Well, the easiest way to show that utility isn’t described by a function is to show the existence of cycles. For example, if I prefer A to B, B to C, and C to A, that’s a cycle—if all three are available I’ll never choose, and if each switch is an increase in utility, my utility blows up to infinity! Well, really, it simply becomes undefined.

Do we have real-world examples of cycles in utility comparisons? Sure. For a cycle of size 2, Eliezer cites the odd behavior of people with regard to money and probabilities of money. However, the money-pumps he cites are rather inefficient. Almost any decision that seems “arbitrary” to us can be translated into a cycle. For example, anchoring means that people assign higher value to a toaster when a more expensive toaster is sitting next to it. But most people, if asked, would certainly assign tiny value to adding/​removing options they know they won’t buy. So we get the result that they must value the toaster more than they value the toaster. The conjunction fallacy can be made into a cycle by the same reasoning if you ask people to bet on the thing happening together and then ask them to bet on the things happening separately.

So at the very least, not all humans have utility functions, which means that the human brain doesn’t automatically give us a utility function to use—if we want one, we have to sculpt it ad-hoc out of intuitions using our general reasoning, and like most human things it probably won’t be the best ever.

So, what practical implications does this have, aside from “people are weird?”

Well, I can think of two interesting things. First, there are the implications for utilitarian ethics. If utility functions are arbitrary not just on a person to person basis, but even within a single person, choosing between options using utilitarian ethics requires stronger, more universal moral arguments. The introspective “I feel like X, therefore my utility function must include that” is now a weak argument, even to yourself! The claim that “a utility function of universe-states exists” loses none of its consequences though, like alerting you that something is wrong when you encounter a cycle in your preferences, or of course supporting consequentialism.

Interesting thing two: application in AI design. The first argument goes something like “well if it works for humans, why wouldn’t it work for AIs?” The first answer, of course, is “because an AI that had a loop it would get stuck in it.” But the first answer is sketchy, because *humans* don’t go get stuck in their cycles. We do interesting things like:

  • meta level aversion to infinite loops.

  • resolving equivalencies/​cycles with “arbitrary” associations.

  • not actually “looping” when we have a cycle, but doing something else that resembles utilitarianism much less.

So replacing a huge utility function with a huge-but-probably-much-smaller set of ad-hoc rules could actually work for AIs if we copy over the right things from human cognitive structure. Would it be possible to make it Friendly or some equivalent? Well, my first answer is “I don’t see why not.” It seems about as possible as doing it for the utility-maximizing AIs. I can think of a few avenues that, if profitable, would make it even simpler (the simplest being “include the Three Laws”), but it could plausibly be unsolvable as well.

The second argument goes something like “It may very well be better to do it with the list of guidelines than with a utility function.” For example, Eliezer makes the convincing argument that fun is not a destination, but a path. What part of that makes sense from a utilitarian perspective? It’s a very human, very lots-of-rules way of understanding things. So why not try to make an AI that can intuitively understand what it’s like to have fun? Hell, why not make an AI that can have fun for the same reason humans can? Wouldn’t that be more… well… fun?

This second argument may be full of it. But it sounds good, eh? Another reason the lots-of-rules approach may beat out the utility function approach is the ease of critical self-improvement. A utility function approach is highly correlated with trying to idealize actions, which would make it tricky to write good code, which is a ridonkulously hard problem to optimize. But a lots-of-rules approach intuitively seems like it could critically self-improve with greater ease—it seems like lots of rules will be needed to make an AI a good computer programmer anyhow, and under that assumption the lots-of-rules approach would be better prepared to deal with ad-hoc rules. Is this assumption false? Can you write a good programmer elegantly? Hell if I know. It just feel like if you could, we would have done it.

Basically, utility functions are guaranteed to have a large set of nice properties. However, if humans are made of ad-hoc rules; if we want a nice property we just add the rule “have this property!” This limits how well we can translate even our own desires into moral guidelines, especially near “strange” areas of our psychology such as comparison cycles. But it also proves that there is a (possibly terrible) alternative to utility functions that intelligences can run on. I’ve tried to make it plausible that the lots-of-rules approach could even be better than the utility-function approach, worth a bit of consideration before you start your next AI project.

Edit note: Removed apparently disliked sentence. Corrected myself after totally forgetting Arrow’s theorem.

Edit 2, not that this will ever be seen: This thought has obviously been thought before, but maybe not followed this far into hypothetical-land: http://​​lesswrong.com/​​lw/​​l0/​​adaptationexecuters_not_fitnessmaximizers/​​