Heading Toward Morality

Fol­lowup to: Ghosts in the Ma­chine, Fake Fake Utility Func­tions, Fake Utility Functions

As peo­ple were com­plain­ing be­fore about not see­ing where the quan­tum physics se­quence was go­ing, I shall go ahead and tell you where I’m head­ing now.

Hav­ing dis­solved the con­fu­sion sur­round­ing the word “could”, the tra­jec­tory is now head­ing to­ward should.

In fact, I’ve been head­ing there for a while. Re­mem­ber the whole se­quence on fake util­ity func­tions? Back in… well… Novem­ber 2007?

I some­times think of there be­ing a train that goes to the Friendly AI sta­tion; but it makes sev­eral stops be­fore it gets there; and at each stop, a large frac­tion of the re­main­ing pas­sen­gers get off.

One of those stops is the one I spent a month lead­ing up to in Novem­ber 2007, the se­quence chron­i­cled in Fake Fake Utility Func­tions and con­cluded in Fake Utility Func­tions.

That’s the stop where some­one thinks of the One Great Mo­ral Prin­ci­ple That Is All We Need To Give AIs.

To de­liver that one warn­ing, I had to go through all sorts of top­ics—which top­ics one might find use­ful even if not work­ing on Friendly AI. I warned against Affec­tive Death Spirals, which re­quired re­curs­ing on the af­fect heuris­tic and halo effect, so that your good feel­ing about one par­tic­u­lar moral prin­ci­ple wouldn’t spiral out of con­trol. I did that whole se­quence on evolu­tion; and dis­cursed on the hu­man abil­ity to make al­most any goal ap­pear to sup­port al­most any policy; I went into evolu­tion­ary psy­chol­ogy to ar­gue for why we shouldn’t ex­pect hu­man ter­mi­nal val­ues to re­duce to any sim­ple prin­ci­ple, even hap­piness, ex­plain­ing the con­cept of “ex­pected util­ity” along the way...

...and talked about ge­nies and more; but you can read the Fake Utility se­quence for that.

So that’s just the warn­ing against try­ing to over­sim­plify hu­man moral­ity into One Great Mo­ral Prin­ci­ple.

If you want to ac­tu­ally dis­solve the con­fu­sion that sur­rounds the word “should”—which is the next stop on the train—then that takes a much longer in­tro­duc­tion. Not just one Novem­ber.

I went through the se­quence on words and defi­ni­tions so that I would be able to later say things like “The next pro­ject is to Ta­boo the word ‘should’ and re­place it with its sub­stance”, or “Sorry, say­ing that moral­ity is self-in­ter­est ‘by defi­ni­tion’ isn’t go­ing to cut it here”.

And also the words-and-defi­ni­tions se­quence was the sim­plest ex­am­ple I knew to in­tro­duce the no­tion of How An Al­gorithm Feels From In­side, which is one of the great mas­ter keys to dis­solv­ing wrong ques­tions. Though it seems to us that our cog­ni­tive rep­re­sen­ta­tions are the very sub­stance of the world, they have a char­ac­ter that comes from cog­ni­tion and of­ten cuts cross­wise to a uni­verse made of quarks. E.g. prob­a­bil­ity; if we are un­cer­tain of a phe­nomenon, that is a fact about our state of mind, not an in­trin­sic char­ac­ter of the phe­nomenon.

Then the re­duc­tion­ism se­quence: that a uni­verse made only of quarks, does not mean that things of value are lost or even de­graded to mun­dan­ity. And the no­tion of how the sum can seem un­like the parts, and yet be as much the parts as our hands are fingers.

Fol­lowed by a new ex­am­ple, one step up in difficulty from words and their seem­ingly in­trin­sic mean­ings: “Free will” and seem­ingly in­trin­sic could-ness.

But be­fore that point, it was use­ful to in­tro­duce quan­tum physics. Not just to get to time­less physics and dis­solve the “de­ter­minism” part of the “free will” con­fu­sion. But also, more fun­da­men­tally, to break be­lief in an in­tu­itive uni­verse that looks just like our brain’s cog­ni­tive rep­re­sen­ta­tions. And pre­sent ex­am­ples of the dis­solu­tion of even such fun­da­men­tal in­tu­itions as those con­cern­ing per­sonal iden­tity. And to illus­trate the idea that you are within physics, within causal­ity, and that strange things will go wrong in your mind if ever you for­get it.

Lately we have be­gun to ap­proach the fi­nal pre­cau­tions, with warn­ings against such no­tions as Author* con­trol: ev­ery mind which com­putes a moral­ity must do so within a chain of lawful causal­ity, it can­not arise from the free will of a ghost in the ma­chine.

And the warn­ing against Pass­ing the Re­cur­sive Buck to some meta-moral­ity that is not it­self com­putably speci­fied, or some meta-moral­ity that is cho­sen by a ghost with­out it be­ing pro­grammed in, or to a no­tion of “moral truth” just as con­fus­ing as “should” it­self...

And the warn­ing on the difficulty of grasp­ing slip­pery things like “should”—demon­strat­ing how very easy it will be to just in­vent an­other black box equiv­a­lent to should-ness, to sweep should-ness un­der a slightly differ­ent rug—or to bounce off into mere modal log­ics of prim­i­tive should-ness...

We aren’t yet at the point where I can ex­plain moral­ity.

But I think—though I could be mis­taken—that we are fi­nally get­ting close to the fi­nal se­quence.

And if you don’t care about my goal of ex­plana­to­rily trans­form­ing Friendly AI from a Con­fus­ing Prob­lem into a merely Ex­tremely Difficult Prob­lem, then stick around any­way. I tend to go through in­ter­est­ing in­ter­me­di­ates along my way.

It might seem like con­fronting “the na­ture of moral­ity” from the per­spec­tive of Friendly AI is only ask­ing for ad­di­tional trou­ble.

Ar­tifi­cial In­tel­li­gence melts peo­ple’s brains. Me­ta­moral­ity melts peo­ple’s brains. Try­ing to think about AI and meta­moral­ity at the same time can cause peo­ple’s brains to spon­ta­neously com­bust and burn for years, emit­ting toxic smoke—don’t laugh, I’ve seen it hap­pen mul­ti­ple times.

But the dis­ci­pline im­posed by Ar­tifi­cial In­tel­li­gence is this: you can­not es­cape into things that are “self-ev­i­dent” or “ob­vi­ous”. That doesn’t stop peo­ple from try­ing, but the pro­grams don’t work. Every thought has to be com­puted some­how, by tran­sis­tors made of mere quarks, and not by moral self-ev­i­dence to some ghost in the ma­chine.

If what you care about is res­cu­ing chil­dren from burn­ing or­phan­ages, I don’t think you will find many moral sur­prises here; my meta­moral­ity adds up to moral nor­mal­ity, as it should. You do not need to worry about meta­moral­ity when you are per­son­ally try­ing to res­cue chil­dren from a burn­ing or­phan­age. The point at which meta­moral is­sues per se have high stakes in the real world, is when you try to com­pute moral­ity in an AI stand­ing in front of a burn­ing or­phan­age.

Yet there is also a good deal of need­less de­spair and mis­guided fear of sci­ence, stem­ming from no­tions such as, “Science tells us the uni­verse is empty of moral­ity”. This is dam­age done by a con­fused meta­moral­ity that fails to add up to moral nor­mal­ity. For that I hope to write down a coun­ter­spell of un­der­stand­ing. Ex­is­ten­tial de­pres­sion has always an­noyed me; it is one of the world’s most pointless forms of suffer­ing.

Don’t ex­pect the fi­nal post on this topic to come to­mor­row, but at least you know where we’re head­ing.

Part of The Me­taethics Sequence

Next post: “No Univer­sally Com­pel­ling Ar­gu­ments

(start of se­quence)