Unnatural Categories

Fol­lowup to: Dis­guised Queries, Su­per­ex­po­nen­tial Con­ceptspace

If a tree falls in the for­est, and no one hears it, does it make a sound?

“Tell me why you want to know,” says the ra­tio­nal­ist, “and I’ll tell you the an­swer.” If you want to know whether your seis­mo­graph, lo­cated nearby, will reg­ister an acous­tic wave, then the ex­per­i­men­tal pre­dic­tion is “Yes”; so, for seis­mo­graphic pur­poses, the tree should be con­sid­ered to make a sound. If in­stead you’re ask­ing some ques­tion about firing pat­terns in a hu­man au­di­tory cor­tex—for what­ever rea­son—then the an­swer is that no such pat­terns will be changed when the tree falls.

What is a poi­son? Hem­lock is a “poi­son”; so is cyanide; so is viper venom. Car­rots, wa­ter, and oxy­gen are “not poi­son”. But what de­ter­mines this clas­sifi­ca­tion? You would be hard pressed, just by look­ing at hem­lock and cyanide and car­rots and wa­ter, to tell what sort of differ­ence is at work. You would have to ad­minister the sub­stances to a hu­man—prefer­ably one signed up for cry­on­ics—and see which ones proved fatal. (And at that, the defi­ni­tion is still sub­tler than it ap­pears: a ton of car­rots, dropped on some­one’s head, will also prove fatal. You’re re­ally ask­ing about fatal­ity from metabolic dis­rup­tion, af­ter ad­minis­ter­ing doses small enough to avoid me­chan­i­cal dam­age and block­age, at room tem­per­a­ture, at low ve­loc­ity.)

Where poi­son-ness is con­cerned, you are not clas­sify­ing via a strictly lo­cal prop­erty of the sub­stance. You are ask­ing about the con­se­quence when a dose of that sub­stance is ap­plied to a hu­man metabolism. The lo­cal differ­ence be­tween a hu­man who gasps and keels over, ver­sus a hu­man al­ive and healthy, is more com­pactly dis­crim­i­nated, than any lo­cal differ­ence be­tween poi­son and non-poi­son.

So we have a sub­stance X, that might or might not be fatally poi­sonous, and a hu­man Y, and we say—to first or­der:

“X is clas­sified ‘fatally poi­sonous’ iff ad­minis­ter­ing X to Y causes Y to en­ter a state clas­sified ‘dead’.”

Much of the way that we clas­sify things—never mind events—is non-lo­cal, en­twined with the con­se­quen­tial struc­ture of the world. All the things we would call a chair are all the things that were made for us to sit on. (Hu­mans might even call two molec­u­larly iden­ti­cal ob­jects a “chair” or “a rock shaped like a chair” de­pend­ing on whether some­one had carved it.)

“That’s okay,” you say, “the differ­ence be­tween liv­ing hu­mans and dead hu­mans is a nice lo­cal prop­erty—a com­pact cluster in Thingspace. Sure, the set of ‘poi­sons’ might not be as com­pact a struc­ture. A cat­e­gory X|X->Y may not be as sim­ple as Y, if the causal link → can be com­pli­cated. Here, ‘poi­son’ is not lo­cally com­pact be­cause of all the com­plex ways that sub­stances act on the com­plex hu­man body. But there’s still noth­ing un­nat­u­ral about the cat­e­gory of ‘poi­son’ - we con­structed it in an ob­serv­able, testable way from cat­e­gories them­selves sim­ple. If you ever want to know whether some­thing should be called ‘poi­sonous’, or not, there’s a sim­ple ex­per­i­men­tal test that set­tles the is­sue.”

Hm. What about a pur­ple, egg-shaped, furred, flex­ible, opaque ob­ject? Is it a blegg, and if so, would you call “bleggs” a nat­u­ral cat­e­gory?

“Sure,” you re­ply, “be­cause you are forced to for­mu­late the ‘blegg’ cat­e­gory, or some­thing closely akin to it, in or­der to pre­dict your fu­ture ex­pe­riences as ac­cu­rately as pos­si­ble. If you see some­thing that’s pur­ple and egg-shaped and opaque, the only way to pre­dict that it will be flex­ible is to draw some kind of com­pact bound­ary in Thingspace and use that to perform in­duc­tion. No cat­e­gory means no in­duc­tion—you can’t see that this ob­ject is similar to other ob­jects you’ve seen be­fore, so you can’t pre­dict its un­known prop­er­ties from its known prop­er­ties. Can’t get much more nat­u­ral than that! Say, what ex­actly would an un­nat­u­ral prop­erty be, any­way?”

Sup­pose I have a poi­son P1 that com­pletely de­stroys one of your kid­neys—causes it to just wither away. This is a very dan­ger­ous poi­son, but is it a fatal poi­son?

“No,” you re­ply, “a hu­man can live on just one kid­ney.”

Sup­pose I have a poi­son P2 that com­pletely de­stroys much of a hu­man brain, kil­ling off nearly all the neu­rons, leav­ing only enough medullary struc­ture to run the body and keep it breath­ing, so long as a hos­pi­tal pro­vides nu­tri­tion. Is P2 a fatal poi­son?

“Yes,” you say, “if your brain is de­stroyed, you’re dead.”

But this dis­tinc­tion that you now make, be­tween P2 be­ing a fatal poi­son and P1 be­ing an only dan­ger­ous poi­son, is not driven by any fun­da­men­tal re­quire­ment of in­duc­tion. Both poi­sons de­stroy or­gans. It’s just that you care a lot more about the brain, than about a kid­ney. The dis­tinc­tion you drew isn’t driven solely by a de­sire to pre­dict ex­pe­rience—it’s driven by a dis­tinc­tion built into your util­ity func­tion. If you have to choose be­tween a dan­ger­ous poi­son and a lethal poi­son, you will of course take the dan­ger­ous poi­son. From which you in­duce that if you must choose be­tween P1 and P2, you’ll take P1.

The clas­sifi­ca­tion that you drew be­tween “lethal” and “non­lethal” poi­sons, was de­signed to help you nav­i­gate the fu­ture—nav­i­gate away from out­comes of low util­ity, to­ward out­comes of high util­ity. The bound­aries that you drew, in Thingspace and Eventspace, were not driven solely by the struc­ture of the en­vi­ron­ment—they were also driven by the struc­ture of your util­ity func­tion; high-util­ity things and low-util­ity things lumped to­gether. That way you can eas­ily choose ac­tions that lead, in gen­eral, to out­comes of high util­ity, over ac­tions that lead to out­comes of low util­ity. If you must pick your poi­son and can only pick one cat­e­gor­i­cal di­men­sion to sort by, you’re go­ing to want to sort the poi­sons into lower and higher util­ity—into fatal and dan­ger­ous, or dan­ger­ous and safe. Whether the poi­son is red or green is a much more lo­cal prop­erty, more com­pact in Thingspace; but it isn’t nearly as rele­vant to your de­ci­sion-mak­ing.

Sup­pose you have a poi­son that puts a hu­man, let’s call her Terry, into an ex­tremely dam­aged state. Her cere­bral cor­tex has turned to mostly fluid, say. So I already la­beled that sub­stance a poi­son; but is it a lethal poi­son?

This would seem to de­pend on whether Terry is dead or al­ive. Her body is breath­ing, cer­tainly—but her brain is dam­aged. In the ex­treme case where her brain was ac­tu­ally re­moved and in­cin­er­ated, but her body kept al­ive, we would cer­tainly have to say that the re­sul­tant was no longer a per­son, from which it fol­lows that the pre­vi­ously ex­ist­ing per­son, Terry, must have died. But here we have an in­ter­me­di­ate case, where the brain is very severely dam­aged but not ut­terly de­stroyed. Where does that poi­son fall on the bor­der be­tween lethal­ity and un­lethal­ity? Where does Terry fall on the bor­der be­tween per­son­hood and non­per­son­hood? Did the poi­son kill Terry or just dam­age her?

Some things are per­sons and some things are not per­sons. It is mur­der to kill a per­son who has not threat­ened to kill you first. If you shoot a chim­panzee who isn’t threat­en­ing you, is that mur­der? How about if you turn off Terry’s life sup­port—is that mur­der?

“Well,” you say, “that’s fun­da­men­tally a moral ques­tion—no sim­ple ex­per­i­men­tal test will set­tle the is­sue un­less we can agree in ad­vance on which facts are the morally rele­vant ones. It’s fu­tile to say ‘This chimp can rec­og­nize him­self in a mir­ror!’ or ‘Terry can’t rec­og­nize her­self in a mir­ror!’ un­less we’re agreed that this is a rele­vant fact—never mind it be­ing the only rele­vant fact.”

I’ve cho­sen the phrase “un­nat­u­ral cat­e­gory” to de­scribe a cat­e­gory whose bound­ary you draw in a way that sen­si­tively de­pends on the ex­act val­ues built into your util­ity func­tion. The most un­nat­u­ral cat­e­gories are typ­i­cally these val­ues them­selves! What is “true hap­piness”? This is en­tirely a moral ques­tion, be­cause what it re­ally means is “What is valuable hap­piness?” or “What is the most valuable kind of hap­piness?” Is hav­ing your plea­sure cen­ter per­ma­nently stim­u­lated by elec­trodes, “true hap­piness”? Your an­swer to that will tend to cen­ter on whether you think this kind of plea­sure is a good thing. “Hap­piness”, then, is a highly un­nat­u­ral cat­e­gory—there are things that lo­cally bear a strong re­sem­blance to “hap­piness”, but which are ex­cluded be­cause we judge them as be­ing of low util­ity, and “hap­piness” is sup­posed to be of high util­ity.

Most ter­mi­nal val­ues turn out to be un­nat­u­ral cat­e­gories, sooner or later. This is why it’s such a tremen­dous difficulty to de­cide whether turn­ing off Terry Schi­avo’s life sup­port is “mur­der”.

I don’t mean to im­ply that un­nat­u­ral cat­e­gories are worth­less or rel­a­tive or what­ever. That’s what moral ar­gu­ments are for—for draw­ing and re­draw­ing the bound­aries; which, when it hap­pens with a ter­mi­nal value, clar­ifies and thereby changes our util­ity func­tion.

I have a twofold mo­ti­va­tion for in­tro­duc­ing the con­cept of an “un­nat­u­ral cat­e­gory”.

The first mo­ti­va­tion is to rec­og­nize when some­one tries to pull a fast one dur­ing a moral ar­gu­ment, by in­sist­ing that no moral ar­gu­ment ex­ists: Terry Schi­avo sim­ply is a per­son be­cause she has hu­man DNA, or she sim­ply is not a per­son be­cause her cere­bral cor­tex has eroded. There is a su­per-ex­po­nen­tial space of pos­si­ble con­cepts, pos­si­ble bound­aries that can be drawn in Thingspace. When we have a pre­dic­tive ques­tion at hand, like “What hap­pens if we run a DNA test on Terry Schi­avo?” or “What hap­pens if we ask Terry Schi­avo to solve a math prob­lem?”, then we have a clear crite­rion of which bound­ary to draw and whether it worked. But when the ques­tion at hand is a moral one, a “What should I do?” ques­tion, then it’s time to shut your eyes and start do­ing moral philos­o­phy. Or eyes open, if there are rele­vant facts at hand—you do want to know what Terry Schi­avo’s brain looks like—but the point is that you’re not go­ing to find an ex­per­i­men­tal test that set­tles the ques­tion, un­less you’ve already de­cided where to draw the bound­aries of your util­ity func­tion’s val­ues.

I think that a ma­jor cause of moral panic among Lud­dites in the pres­ence of high tech­nol­ogy, is that tech­nol­ogy tends to pre­sent us with bound­ary cases on our moral val­ues—rais­ing moral ques­tions that were never pre­vi­ously en­coun­tered. In the old days, Terry Schi­avo would have stopped breath­ing long since. But I find it difficult to blame this on tech­nol­ogy—it seems to me that there’s some­thing wrong with go­ing into a panic just be­cause you’re be­ing asked a new moral ques­tion. Couldn’t you just be asked the same moral ques­tion at any time?

If you want to say, “I don’t know, so I’ll strate­gize con­ser­va­tively to avoid the bound­ary case, or treat un­cer­tain peo­ple as peo­ple,” that’s one ar­gu­ment.

But to say, “AAAIIIEEEE TECHNOLOGY ASKED ME A QUESTION I DON’T KNOW HOW TO ANSWER, TECHNOLOGY IS UNDERMINING MY MORALITY” strikes me as putting the blame in the wrong place.

I should be able to ask you any­thing, even if you can’t an­swer. If you can’t an­swer, then I’m not un­der­min­ing your moral­ity—it was already un­der­mined.

My sec­ond mo­ti­va­tion… is to start ex­plain­ing an­other rea­son why Friendly AI is difficult.

I was re­cently try­ing to ex­plain to some­one why, even if all you wanted to do was fill the uni­verse with pa­per­clips, build­ing a pa­per­clip max­i­mizer would still be a hard prob­lem of FAI the­ory. Why? Be­cause if you cared about pa­per­clips for their own sake, then you wouldn’t want the AI to fill the uni­verse with things that weren’t re­ally pa­per­clips—as you draw that bound­ary!

For a hu­man, “pa­per­clip” is a rea­son­ably nat­u­ral cat­e­gory; it looks like this-and-such and we use it to hold pa­pers to­gether. The “pa­pers” them­selves play no di­rect role in our moral val­ues; we just use them to re­new the li­cense plates on our car, or what­ever. “Paper­clip”, in other words, is far enough away from hu­man ter­mi­nal val­ues, that we tend to draw the bound­ary us­ing tests that are rel­a­tively em­piri­cal and ob­serv­able. If you pre­sent us with some strange thing that might or might not be a pa­per­clip, we’ll just see if we can use it to hold pa­pers to­gether. If you pre­sent us with some strange thing that might or might not be pa­per, we’ll see if we can write on it. Rel­a­tively sim­ple ob­serv­able tests.

But there isn’t any equally sim­ple ex­per­i­men­tal test the AI can perform to find out what is or isn’t a “pa­per­clip”, if “pa­per­clip” is a con­cept whose im­por­tance stems from it play­ing a di­rect role in the util­ity func­tion.

Let’s say that you’re try­ing to make your lit­tle baby pa­per­clip max­i­mizer in the ob­vi­ous way: show­ing it a bunch of things that are pa­per­clips, and a bunch of things that aren’t pa­per­clips, in­clud­ing what you con­sider to be near misses like sta­ples and glue­sticks. The AI for­mu­lates an in­ter­nal con­cept that de­scribes pa­per­clips, and you test it on some more things, and it seems to dis­crim­i­nate the same way you do. So you hook up the “pa­per­clip” con­cept to the util­ity func­tion, and off you go!

Soon the AI grows up, kills off you and your species, and be­gins its quest to trans­form the uni­verse into pa­per­clips. But wait—now the AI is con­sid­er­ing new po­ten­tial bound­ary cases of “pa­per­clip” that it didn’t see dur­ing its train­ing phase. Boundary cases, in fact, that you never men­tioned—let alone showed the AI—be­cause it didn’t oc­cur to you that they were pos­si­ble. Sup­pose, for ex­am­ple, that the thought of tiny molec­u­lar pa­per­clips had never oc­curred to you. If it had, you would have ag­o­nized for a while—like the way that peo­ple ag­o­nized over Terry Schi­avo—and then fi­nally de­cided that the tiny molec­u­lar pa­per­clip-shapes were not “real” pa­per­clips. But the thought never oc­curred to you, and you never showed the AI pa­per­clip-shapes of differ­ent sizes and told the AI that only one size was cor­rect, dur­ing its train­ing phase. So the AI fills the uni­verse with tiny molec­u­lar pa­per­clips—but those aren’t real pa­per­clips at all! Alas! There’s no sim­ple ex­per­i­men­tal test that the AI can perform to find out what you would have de­cided was or was not a high-util­ity pa­per­cli­plike ob­ject.

What? No sim­ple test? What about: “Ask me what is or isn’t a pa­per­clip, and see if I say ‘Yes’. That’s your new meta-util­ity func­tion!”

You per­ceive, I hope, why it isn’t so easy.

If not, here’s a hint:

“Ask”, “me”, and “say ‘Yes’”.