Fake Optimization Criteria

Eliezer Yudkowsky10 Nov 2007 0:10 UTC

66 points

I’ve previously dwelt in considerable length upon forms of rationalization whereby our beliefs appear to match the evidence much more strongly than they actually do. And I’m not overemphasizing the point, either. If we could beat this fundamental metabias and see what every hypothesis really predicted, we would be able to recover from almost any other error of fact.

The mirror challenge for decision theory is seeing which option a choice criterion really endorses. If your stated moral principles call for you to provide laptops to everyone, does that really endorse buying a $1 million gem-studded laptop for yourself, or spending the same money on shipping 5000 OLPCs?

We seem to have evolved a knack for arguing that practically any goal implies practically any action. A phlogiston theorist explaining why magnesium gains weight when burned has nothing on an Inquisitor explaining why God’s infinite love for all His children requires burning some of them at the stake.

There’s no mystery about this. Politics was a feature of the ancestral environment. We are descended from those who argued most persuasively that the good of the tribe meant executing their hated rival Uglak. (We sure ain’t descended from Uglak.)

And yet… is it possible to prove that if Robert Mugabe cared only for the good of Zimbabwe, he would resign from its presidency? You can argue that the policy follows from the goal, but haven’t we just seen that humans can match up any goal to any policy? How do you know that you’re right and Mugabe is wrong? (There are a number of reasons this is a good guess, but bear with me here.)

Human motives are manifold and obscure, our decision processes as vastly complicated as our brains. And the world itself is vastly complicated, on every choice of real-world policy. Can we even prove that human beings are rationalizing—that we’re systematically distorting the link from principles to policy—when we lack a single firm place on which to stand? When there’s no way to find out exactly what even a single optimization criterion implies? (Actually, you can just observe that people disagree about office politics in ways that strangely correlate to their own interests, while simultaneously denying that any such interests are at work. But again, bear with me here.)

Where is the standardized, open-source, generally intelligent, consequentialist optimization process into which we can feed a complete morality as an XML file, to find out what that morality really recommends when applied to our world? Is there even a single real-world case where we can know exactly what a choice criterion recommends? Where is the pure moral reasoner—of known utility function, purged of all other stray desires that might distort its optimization—whose trustworthy output we can contrast to human rationalizations of the same utility function?

Why, it’s our old friend the alien god, of course! Natural selection is guaranteed free of all mercy, all love, all compassion, all aesthetic sensibilities, all political factionalism, all ideological allegiances, all academic ambitions, all libertarianism, all socialism, all Blue and all Green. Natural selection doesn’t maximize its criterion of inclusive genetic fitness—it’s not that smart. But when you look at the output of natural selection, you are guaranteed to be looking at an output that was optimized only for inclusive genetic fitness, and not the interests of the US agricultural industry.

In the case histories of evolutionary science—in, for example, The Tragedy of Group Selectionism—we can directly compare human rationalizations to the result of pure optimization for a known criterion. What did Wynne-Edwards think would be the result of group selection for small subpopulation sizes? Voluntary individual restraint in breeding, and enough food for everyone. What was the actual laboratory result? Cannibalism.

Now you might ask: Are these case histories of evolutionary science really relevant to human morality, which doesn’t give two figs for inclusive genetic fitness when it gets in the way of love, compassion, aesthetics, healing, freedom, fairness, et cetera? Human societies didn’t even have a concept of “inclusive genetic fitness” until the 20th century.

But I ask in return: If we can’t see clearly the result of a single monotone optimization criterion—if we can’t even train ourselves to hear a single pure note—then how will we listen to an orchestra? How will we see that “Always be selfish” or “Always obey the government” are poor guiding principles for human beings to adopt—if we think that even optimizing genes for inclusive fitness will yield organisms which sacrifice reproductive opportunities in the name of social resource conservation?

To train ourselves to see clearly, we need simple practice cases.

What links here?

Eliezer Yudkowsky10 Nov 2007 0:10 UTC

66 points

21 comments3 min readLW link Archive

Optimization Motivated Reasoning

Stefan_Pernar 10 Nov 2007 0:54 UTC
0 points
Evolution does not stop on the genetic level but continues on the <a href=”a href=”http://www.jame5.com/?p=23“>cognitive level allowing for a far higher complexity and speed. As a result group selection becomes intuitively obvious although on the cognitive level members of weaker groups have of cause in principle the chance to change their minds aka evolve their beliefs before physical annihilation.

“If we can’t see clearly the result of a single monotone optimization criterion”

We can project where ever increasing fitness leads up to and it is up to us to make sure we will have a place in such a future.
Aaron_Luchko 10 Nov 2007 8:14 UTC
−2 points
I think the problem with trying to come up with a concrete definition of morality is the only real problems are ones without real solutions. In science we can solve previously unknown problems because we’re constantly building on newly discovered knowledge. But with morality the basic situations have existed mostly unchanged for most of our evolution and we don’t have any real advantage over previous generations, thus any problem worth solving is there because we can’t solve it.

For instance you’re never going to get a leader who’s complete moral argument for governing is “I should lead this country because I randomly murder people in horrible ways”. Any leader like that will never gain enough supporters to form a government, sure there are leaders who essentially lead in that fashion but they always have some idealist justification for why they should lead.

Thus you can’t set down laws like “Always be selfish” or “Always obey the government” since if it’s not completely obvious and universal you wouldn’t be interested in that question.

However you can set down a moral law like “Don’t torture a thousand people to death to achieve the same amount of satisfaction you’d get from eating a strawberry unless there are an unbelievably contrived set of extenuating circumstances involved, probably something involving the number 3^^^3”. However, one would hope that’s already part of your moral code...
Pete_Carlton 10 Nov 2007 9:55 UTC
−1 points
Where is the standardized, open-source, generally intelligent, consequentialist optimization process into which we can feed a complete morality as an XML file, to find out what that morality really recommends when applied to our world?

We have reasons to think this step will never be easy. If you imagine that this file, like most files, is something like version 2.1.8, who is going to make the decision to make this version “count”, instead of waiting to see what comes out of the tests underway in version 2.1.9? By what moral critera will we decide upon a standard morality file? Of course, Nietzsche also foresaw this problem, and Dennett points out that it’s still a big problem despite how much we’ve learned about what humans are, but he does not proffer a solution to it. Do we just want the utility function currently in vogue to win out? When will we be satisfied we’ve got the right one?

Or will evolution (i.e., force) settle it?
Stefan_Pernar 10 Nov 2007 13:43 UTC
0 points
Aaron Luchko, I argue that morality can be universally defined. You can find my thoughts in my paper on friendly AI theory? Would love to hear your comments.

Somehow the links in my earlier comment got messed up.

For the link behind ‘cognitive evolution’ see: http://www.jame5.com/?p=23 For the link behind ‘make sure we will have a place’ see: http://www.jame5.com/?p=17
Caledonian2 10 Nov 2007 15:54 UTC
−2 points
it is up to us to make sure we will have a place in such a future

The thing about post-humanity is that it will not have humanity in it. It’s up to us to make sure that post-humanity comes into existence. This necessarily involves the obsolescence of human beings.

The future we must build necessarily cannot have a place for us in it. That’s the point! The acorn does not survive the creation of the oak.
gutzperson 10 Nov 2007 18:22 UTC
0 points
Quote “it is up to us to make sure we will have a place in such a future”
- Question; Are you thinking about Moravec’s fantasies of collective (sub)consciousness spread through the universe?
Quote “The thing about post-humanity is that it will not have humanity in it. It’s up to us to make sure that post-humanity comes into existence. This necessarily involves the obsolescence of human beings. The future we must build necessarily cannot have a place for us in it. That’s the point! The acorn does not survive the creation of the oak.”

You’re kidding. Post-history is fiction, and history is well and alive. Post-humanity as you describe it, is okay for science fiction. If you want to read something intelligent about post-humanism, please read Katherine Hayles: The Human in the Posthuman. Moravec with his fantasies of extracts of a grey collective brain mass welded together in a post-human orgy of whisper and thought(lessness) makes me weep. His is a truly religiously motivated afterlife fantasy. And yours has got some mystical/mythical aspects, too. Oaks?!?!
Stefan_Pernar 10 Nov 2007 23:28 UTC
0 points
Caledonian, yes—I agree 100% - the tricky part is getting to post humanity—avoiding a non-friendly AI. That would be a future where we have a place in the sense that we will have evolved further.

gutzperson, today you are gutzperson—tomorrow you are post-gutzperson yesterday—ensuring your continued existence in that sense will lead to your eventual transcendence. Same for everyone else—just don’t extinguish that strand.
gutzperson 11 Nov 2007 10:18 UTC
0 points
Let’s say I am super-gutzperson, beyond post- and past. I am all for utopia. I am all for AI and whatever will come. I am also for co-existence. I am amazed about a species that so happily prepares for their own extinction or replacement. Would you like to test post-evolution on mice and replace them with post-mice? I actually love my body and would like future generations of humans to be able to enjoy this too. As Hayles says in so many words, post-human does not mean without humans. This was my message to Caledonian.
Caledonian2 11 Nov 2007 13:28 UTC
−1 points
Caledonian, yes—I agree 100% - the tricky part is getting to post humanity—avoiding a non-friendly AI.

That’s not what I’m saying at all.
Gray_Area 11 Nov 2007 23:13 UTC
0 points
Stefan Pernar said: “I argue that morality can be universally defined.”

As Eliezer points out, evolution is blind, and so ‘fitness’ can have as a side-effect what we would intuitively consider unimaginable moral horrors (much worse than parasitic wasps and cats playing with their food). I think if you want to define ‘the Good’ in the way you do, you need to either explain how such horrors are to be avoided, or educate the common intuition.
Stefan_Pernar 12 Nov 2007 6:16 UTC
0 points
Caledonian, sorry—do you mean that humanity needs to be superseded?

Gray Area, did you read my paper on friendly AI yet? I must be sounding like a broken record by now ;-)

I justify my statement ‘that is good what increases fitness’ with the axiomatic belief of ‘to exist is preferable over not to exist’

The phenomena created by evolution that seem like horrors to us (parasitic wasps) must be that particular wasp’s pinnacle of joy. It is a matter of perspective. I am not saying: eat crap—millions of flies can’t be wrong! I am taking the human perspective—not that of a wasp or a fly or of random horror inducing entity but can understand other entities points of view and see the general principle—what increases fitness of an entity is good for that entity. Generally put: that is good what increases fitness.
Ben_Jones 7 Dec 2007 11:49 UTC
1 point
So should I compare my every action with what natural selection/Alien God tells me I should do?

If I’m following you Eliezer, I should be thinking ‘the Alien God tells me I should counter this argument with a right hook to the chops. My rationality tells me I shouldn’t. Whither this difference? Is it a bias; rationalisation, false justification; or is it some ethereal, abstract entity known as ‘morality’? To whom should I listen, and why?′

Should I internalise the disinterested Alien God so that I can see reality ‘for what it truly is’, or so that I’m more likely to pass my genes on? Or both?
JulianMorrison 7 Dec 2007 12:30 UTC
2 points
Caledonian: Posthumanity can easily be humanity plus X, where X is some conglomeration of augmentations, backups, and—later—full replacements. This is a “ship of Theseus”. Humanity can easily survive the total replacement and augmentation of its parts because humanity is, at root, the myriad of terminal values that we inherit and learn. You can take the mind out of the meat, but you can’t take the meat out of the mind.

I expect I will live through a continuity of mind to age 1000 and beyond.

I do not expect the 1000 year old me will look or feel recognizably like unmodified humanity except as a deliberate effort.
MrHen 17 Mar 2010 19:02 UTC
1 point
What is the point of this post? I seem to have missed it entirely. Can anyone help me out?

The mirror challenge for decision theory is seeing which option a choice criterion really endorses. If your stated moral principles call for you to provide laptops to everyone, does that really endorse buying a $1 million gem-studded laptop for yourself, or spending the same money on shipping 5000 OLPCs?

Is the point that predicting the end result of particular criterion is difficult because bias gets in the way? And, because it is difficult, start small with stuff like gene fitness and work up to bigger problems like social ethics?

Where is the pure moral reasoner [...] whose trustworthy output we can contrast to human rationalizations of the same utility function? [...] Why, it’s our old friend the alien god, of course! Natural selection is guaranteed free of all mercy, all love, all compassion, all aesthetic sensibilities, all political factionalism, all ideological allegiances, all academic ambitions, all libertarianism, all socialism, all Blue and all Green.

Or… is the point that natural selection is a great way to expose the biases at work in our ethics choice criterion?

I am not tracking on something here. This is a summary of the points in the post as I see them:
- We are unable to accurately study how closely the results of our actions match our own predictions of those results.
- The equivalent problem in decision theory is that we are unable to take a set of known choice criteria and predict which choice will be made given a particular environment. In other words, we think we know what we would/should do in event X but we are wrong.
- We possess the ability to predict any particular action from all possible choice criteria.
- Is it possible to prove that a particular action does or does not follow from certain choice criteria, thereby avoiding our tendency to predict anything from everything?
- We need a bias free system to study that allows us to measure our predictions without interfering with the result of the system.
- Natural selection presents a system whose only “goal” is inclusive genetic fitness. There is no bias.
- Examples show that our predictions of natural selection reveal biases in ourselves. Therefore, our predictions were biased.
- To remove our bias with regards to human ethics, we should use natural selection as a calibration tool.
I feel like the last point skipped over a few points. As best as I can tell, these belong just before the last point:
- When our predictions of the bias-proof system are accurate, they will be predictions without bias.
- Using the non-biased predictors we found to study the bias-proof system, we can study other systems with less bias.
Using this outline, it seems like the takeaway is, “Don’t study ethics until after you studied natural selection because there is too much bias involved in studying ethics.”

Can someone tell me if I am correct? A simple yes or no is cool if you don’t feel like typing up a whole lot. Even, “No, not even close,” will give me more information than I have right now.
- thomblake 17 Mar 2010 19:11 UTC
  5 points
  Parent
  Seems about right. Note: “To train ourselves to see clearly, we need simple practice cases.”
Ender 23 Aug 2011 21:37 UTC
0 points
The mention of music and evolution sent me off on a tangent, which was to wonder why human brains have a sense of music. A lot of music theory makes mathematical sense (the overtone series), but it seems odd from an evolution standpoint that musicianship was a good allele to have.
- ialdabaoth 2 May 2013 16:07 UTC
  4 points
  Parent
  I believe the current theory is that musical talent was a sexual selection criteria that ‘blew up’. Good rhythm, a good singing voice, and an ability to remember complex rhythm were originally linked to timing and muscle coordination, and so helped to signal for hunting fitness; and to intelligence, and so helped to signal for the ability to navigate the pack’s social landscape. But once sexual selection for a trait begins, that trait can take on a life of its own, leading to things like peacocks’ tails and lyre bird’s mating calls.
  - Ender 28 Jul 2013 23:42 UTC
    1 point
    Parent
    This article from 2005 says that while there are some different theories about the evolution of music, there is not enough evidence yet to reach a conclusion. http://www.cns.nyu.edu/~jhm/mcdermott_hauser_mp.pdf
    
    In another article, Geoffrey F. Miller explained that Darwin hypothesized that hominids might have included some music in their courtship, similar to birdsong, before the development of language. Darwin’s theory is described pretty clearly in the refrain of “Who Put the Bomp,” but you can also google the article.
    
    G. F. (2000). Evolution of human music through sexual selection. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The origins of music, MIT Press, pp. 329-360.
Ford 22 Feb 2013 19:59 UTC
2 points
You might like the “simple practice cases” in my recently published book, Darwinian Agriculture. Has natural selection favored solar tracking by leaves because it increases photosynthesis, or because it decreases the photosynthesis of competitors? What sex ratio (in reindeer, say) is favored by natural selection, and what sex ratio maximizes meat production from a given amount of lichen? Why do rhizobial bacteria provide their legume hosts with nitrogen, if healthier plants will indirectly help other rhizobia infecting the same plant—their most-likely competitors for the next host?
christopherj 20 Oct 2013 22:25 UTC
0 points
So what are the forces at play in this scenario? Evolution will tend toward optimizing the replication of anything that replicates; the environment consists of inanimate objects and other entities that evolution is tending to optimize; each entity is optimized at an individual level despite interactions with other replicating entities. There is no foresight. Evolution will show no favoritism among the various replicating entities.

Gene fragments, genes, organelles, cells, individuals, family, and society are all replicating entities, and all can be selected for—though selection for each of the components occurs first and more often, and entirely uninterested in higher levels of selection. Information encoded in neural networks is also a replicating entity.

Consider for example a human: ~3.2 billion base pairs of DNA, comprising ~25,000 genes, on 23 chromosomes, and all the previous mostly doubled to make a diploid cell. Mendelian reproduction serves to enforce some cooperation of the various genes. Early differentiation of cells into reproductive and somatic cells serves to enforce cooperation at the cell level; somatic cells won’t reproduce indefinitely, but can assist the reproduction of the gamete cells. These mechanisms work pretty well, though despite their severity there are exceptions—for example meiotic drive and retrotransposons allow genes to cheat Mendelian reproduction, and transmissible cancer as seen devastating the Tasmanian devils shows cells can successfully go rouge. Social enforcement mechanisms exist, but are mild compared to the aforementioned methods.

Humans also contain information stored in the brain, which can be modified and transmitted (though a proper model of that would be like creating an artificial general intelligence). Ideas are not tied to the genes, and are transmitted independently of the genes of the humans holding them—so why shouldn’t there be ideas that act in opposition to the genes of the human holding them? It would be quite the achievement for evolution to produce humans immune to ideas harmful to their genes, while still keeping the enormously useful capability to generate and transmit ideas.

As a side note, consider the search space of evolution. The request, “Find the strand of DNA size 3.2 billion base pairs in length, that is optimal for reproduction in [this environment]” consists of a search space of over 4^3,200,000,000. (And the actual search space is indefinitely larger.) Even an entity with access to the combined resources of the entire universe isn’t going to be able to look through that search space.
tlhonmey 6 Jun 2022 17:19 UTC
1 point
Personally I think the Inquisitor has a much better case than the Phlogiston theorist.
If humans have an immortal soul, then saving that soul from an eternity of torment would easily justify nearly anything temporarily inflicted on the mortal body in the same manner that saving someone’s life from a burst appendix justifies slicing open their belly. While brutal, the Inquisitor is self-consistent. Or, at least, he could be.
Magnesium gaining weight when burned, however, has to be special-cased away to fit with Phlogiston theory. There aren’t really any coherent explanations for it that don’t boil down to “Magnesium doesn’t count.”
Still, it’s a good example of the lengths to which people will go to justify their own preferred courses of action. The Inquisition was, after all, largely political rather than religious, concerned with rooting the last of the Moorish sympathizers out of Spain.