Nonperson Predicates

Fol­lowup to: Right­ing a Wrong Ques­tion, Zom­bies! Zom­bies?, A Pre­ma­ture Word on AI, On Do­ing the Impossible

There is a sub­prob­lem of Friendly AI which is so scary that I usu­ally don’t talk about it, be­cause very few would-be AI de­sign­ers would re­act to it ap­pro­pri­ately—that is, by say­ing, “Wow, that does sound like an in­ter­est­ing prob­lem”, in­stead of find­ing one of many sub­tle ways to scream and run away.

This is the prob­lem that if you cre­ate an AI and tell it to model the world around it, it may form mod­els of peo­ple that are peo­ple them­selves. Not nec­es­sar­ily the same per­son, but peo­ple nonethe­less.

If you look up at the night sky, and see the tiny dots of light that move over days and weeks—planē­toi, the Greeks called them, “wan­der­ers”—and you try to pre­dict the move­ments of those planet-dots as best you can...

His­tor­i­cally, hu­mans went through a jour­ney as long and as wan­der­ing as the planets them­selves, to find an ac­cu­rate model. In the be­gin­ning, the mod­els were things of cy­cles and epicy­cles, not much re­sem­bling the true So­lar Sys­tem.

But even­tu­ally we found laws of grav­ity, and fi­nally built mod­els—even if they were just on pa­per—that were ex­tremely ac­cu­rate so that Nep­tune could be de­duced by look­ing at the un­ex­plained per­tur­ba­tion of Uranus from its ex­pected or­bit. This re­quired mo­ment-by-mo­ment mod­el­ing of where a sim­plified ver­sion of Uranus would be, and the other known planets. Si­mu­la­tion, not just ab­strac­tion. Pre­dic­tion through sim­plified-yet-still-de­tailed poin­t­wise similar­ity.

Sup­pose you have an AI that is around hu­man be­ings. And like any Bayesian try­ing to ex­plain its enivorn­ment, the AI goes in quest of highly ac­cu­rate mod­els that pre­dict what it sees of hu­mans.

Models that pre­dict/​ex­plain why peo­ple do the things they do, say the things they say, want the things they want, think the things they think, and even why peo­ple talk about “the mys­tery of sub­jec­tive ex­pe­rience”.

The model that most pre­cisely pre­dicts these facts, may well be a ‘simu­la­tion’ de­tailed enough to be a per­son in its own right.

A highly de­tailed model of me, may not be me. But it will, at least, be a model which (for pur­poses of pre­dic­tion via similar­ity) thinks it­self to be Eliezer Yud­kowsky. It will be a model that, when cranked to find my be­hav­ior if asked “Who are you and are you con­scious?”, says “I am Eliezer Yud­kowsky and I seem have sub­jec­tive ex­pe­riences” for much the same rea­son I do.

If that doesn’t worry you, (re)read “Zom­bies! Zom­bies?”.

It seems likely (though not cer­tain) that this hap­pens au­to­mat­i­cally, when­ever a mind of suffi­cient power to find the right an­swer, and not oth­er­wise dis­in­clined to cre­ate a sen­tient be­ing trapped within it­self, tries to model a hu­man as ac­cu­rately as pos­si­ble.

Now you could wave your hands and say, “Oh, by the time the AI is smart enough to do that, it will be smart enough not to”. (This is, in gen­eral, a phrase use­ful in run­ning away from Friendly AI prob­lems.) But do you know this for a fact?

When deal­ing with things that con­fuse you, it is wise to widen your con­fi­dence in­ter­vals. Is a hu­man mind the sim­plest pos­si­ble mind that can be sen­tient? What if, in the course of try­ing to model its own pro­gram­mers, a rel­a­tively younger AI man­ages to cre­ate a sen­tient simu­la­tion trapped within it­self? How soon do you have to start wor­ry­ing? Ask your­self that fun­da­men­tal ques­tion, “What do I think I know, and how do I think I know it?”

You could wave your hands and say, “Oh, it’s more im­por­tant to get the job done quickly, then to worry about such rel­a­tively minor prob­lems; the end jus­tifies the means. Why, look at all these prob­lems the Earth has right now...” (This is also a gen­eral way of run­ning from Friendly AI prob­lems.)

But we may con­sider and dis­card many hy­pothe­ses in the course of find­ing the truth, and we are but slow hu­mans. What if an AI cre­ates mil­lions, billions, trillions of al­ter­na­tive hy­pothe­ses, mod­els that are ac­tu­ally peo­ple, who die when they are dis­proven?

If you ac­ci­den­tally kill a few trillion peo­ple, or per­mit them to be kil­led—you could say that the weight of the Fu­ture out­weighs this evil, per­haps. But the ab­solute weight of the sin would not be light. If you would balk at kil­ling a mil­lion peo­ple with a nu­clear weapon, you should balk at this.

You could wave your hands and say, “The model will con­tain ab­strac­tions over var­i­ous un­cer­tain­ties within it, and this will pre­vent it from be­ing con­scious even though it pro­duces well-cal­ibrated prob­a­bil­ity dis­tri­bu­tions over what you will say when you are asked to talk about con­scious­ness.” To which I can only re­ply, “That would be very con­ve­nient if it were true, but how the hell do you know that?” An el­e­ment of a model marked ‘ab­stract’ is still there as a com­pu­ta­tional to­ken, and the in­ter­act­ing causal sys­tem may still be sen­tient.

For these pur­poses, we do not, in prin­ci­ple, need to crack the en­tire Hard Prob­lem of Con­scious­ness—the con­fu­sion that we name “sub­jec­tive ex­pe­rience”. We only need to un­der­stand enough of it to know when a pro­cess is not con­scious, not a per­son, not some­thing de­serv­ing of the rights of cit­i­zen­ship. In prac­tice, I sus­pect you can’t halfway stop be­ing con­fused—but in the­ory, half would be enough.

We need a non­per­son pred­i­cate—a pred­i­cate that re­turns 1 for any­thing that is a per­son, and can re­turn 0 or 1 for any­thing that is not a per­son. This is a “non­per­son pred­i­cate” be­cause if it re­turns 0, then you know that some­thing is definitely not a per­son.

You can have more than one such pred­i­cate, and if any of them re­turns 0, you’re ok. It just had bet­ter never re­turn 0 on any­thing that is a per­son, how­ever many non­peo­ple it re­turns 1 on.

We can even hope that the vast ma­jor­ity of mod­els the AI needs, will be swiftly and triv­ially ap­proved by a pred­i­cate that quickly an­swers 0. And that the AI would only need to re­sort to more spe­cific pred­i­cates in case of mod­el­ing ac­tual peo­ple.

With a good toolbox of non­per­son pred­i­cates in hand, we could ex­clude all “model cit­i­zens”—all be­liefs that are them­selves peo­ple—from the set of hy­pothe­ses our Bayesian AI may in­vent to try to model its per­son-con­tain­ing en­vi­ron­ment.

Does that sound odd? Well, one has to han­dle the prob­lem some­how. I am open to bet­ter ideas, though I will be a bit skep­ti­cal about any sug­ges­tions for how to pro­ceed that let us clev­erly avoid solv­ing the damn mys­tery.

So do I have a non­per­son pred­i­cate? No. At least, no non­triv­ial ones.

This is a challenge that I have not even tried to talk about, with those folk who think them­selves ready to challenge the prob­lem of true AI. For they seem to have the stan­dard re­flex of run­ning away from difficult prob­lems, and are challeng­ing AI only be­cause they think their amaz­ing in­sight has already solved it. Just men­tion­ing the prob­lem of Friendly AI by it­self, or of pre­ci­sion-grade AI de­sign, is enough to send them flee­ing into the night, scream­ing “It’s too hard! It can’t be done!” If I tried to ex­plain that their job du­ties might im­p­inge upon the sa­cred, mys­te­ri­ous, holy Prob­lem of Sub­jec­tive Ex­pe­rience—

—I’d ac­tu­ally ex­pect to get blank stares, mostly, fol­lowed by some in­stan­ta­neous dis­mis­sal which re­quires no fur­ther effort on their part. I’m not sure of what the ex­act dis­mis­sal would be—maybe, “Oh, none of the hy­pothe­ses my AI con­sid­ers, could pos­si­bly be a per­son?” I don’t know; I haven’t both­ered try­ing. But it has to be a dis­mis­sal which rules out all pos­si­bil­ity of their hav­ing to ac­tu­ally solve the damn prob­lem, be­cause most of them would think that they are smart enough to build an AI—in­deed, smart enough to have already solved the key part of the prob­lem—but not smart enough to solve the Mys­tery of Con­scious­ness, which still looks scary to them.

Even if they thought of try­ing to solve it, they would be afraid of ad­mit­ting they were try­ing to solve it. Most of these peo­ple cling to the shreds of their mod­esty, try­ing at one and the same time to have solved the AI prob­lem while still be­ing hum­ble or­di­nary blokes. (There’s a grain of truth to that, but at the same time: who the hell do they think they’re kid­ding?) They know with­out words that their au­di­ence sees the Mys­tery of Con­scious­ness as a sa­cred un­touch­able prob­lem, re­served for some fu­ture su­per­be­ing. They don’t want peo­ple to think that they’re claiming an Ein­stei­nian aura of des­tiny by try­ing to solve the prob­lem. So it is eas­ier to dis­miss the prob­lem, and not be­lieve a propo­si­tion that would be un­com­fortable to ex­plain.

Build an AI? Sure! Make it Friendly? Now that you point it out, sure! But try­ing to come up with a “non­per­son pred­i­cate”? That’s just way above the difficulty level they signed up to han­dle.

But a blank map does not cor­re­spond to a blank ter­ri­tory. Im­pos­si­ble con­fus­ing ques­tions cor­re­spond to places where your own thoughts are tan­gled, not to places where the en­vi­ron­ment it­self con­tains magic. Even difficult prob­lems do not re­quire an aura of des­tiny to solve. And the first step to solv­ing one is not run­ning away from the prob­lem like a fright­ened rab­bit, but in­stead stick­ing long enough to learn some­thing.

So let us not run away from this prob­lem. I doubt it is even difficult in any ab­solute sense, just a place where my brain is tan­gled. I sus­pect, based on some prior ex­pe­rience with similar challenges, that you can’t re­ally be good enough to build a Friendly AI, and still be tan­gled up in your own brain like that. So it is not nec­es­sar­ily any new effort—over and above that re­quired gen­er­ally to build a mind while know­ing ex­actly what you are about.

But in any case, I am not scream­ing and run­ning away from the prob­lem. And I hope that you, dear long­time reader, will not faint at the au­dac­ity of my try­ing to solve it.

Part of The Fun The­ory Sequence

Next post: “Non­sen­tient Op­ti­miz­ers

Pre­vi­ous post: “Devil’s Offers