The Thing That I Protect

Fol­lowup to: Some­thing to Pro­tect, Value is Fragile

Some­thing to Pro­tect” dis­cursed on the idea of wield­ing ra­tio­nal­ity in the ser­vice of some­thing other than “ra­tio­nal­ity”. Not just that ra­tio­nal­ists ought to pick out a Noble Cause as a hobby to keep them busy; but rather, that ra­tio­nal­ity it­self is gen­er­ated by hav­ing some­thing that you care about more than your cur­rent rit­ual of cog­ni­tion.

So what is it, then, that I pro­tect?

I quite de­liber­ately did not dis­cuss that in “Some­thing to Pro­tect”, leav­ing it only as a hang­ing im­pli­ca­tion. In the un­likely event that we ever run into aliens, I don’t ex­pect their ver­sion of Bayes’s The­o­rem to be math­e­mat­i­cally differ­ent from ours, even if they gen­er­ated it in the course of pro­tect­ing differ­ent and in­com­pat­i­ble val­ues. Among hu­mans, the idiom of hav­ing “some­thing to pro­tect” is not bound to any one cause, and there­fore, to men­tion my own cause in that post would have harmed its in­tegrity. Causes are dan­ger­ous things, what­ever their true im­por­tance; I have writ­ten some­what on this, and will write more about it.

But still—what is it, then, the thing that I pro­tect?

Friendly AI? No—a thou­sand times no—a thou­sand times not any­more. It’s not think­ing of the AI that gives me strength to carry on even in the face of in­con­ve­nience.

I would be a strange and dan­ger­ous AI wannabe if that were my cause—the image in my mind of a perfected be­ing, an ex­is­tence greater than hu­mankind. Maybe some­day I’ll be able to imag­ine such a child and try to build one, but for now I’m too young to be a father.

Those of you who’ve been fol­low­ing along re­cent dis­cus­sions, par­tic­u­larly “Value is Frag­ile”, might have no­ticed some­thing else that I might, per­haps, hold pre­cious. Smart agents want to pro­tect the phys­i­cal rep­re­sen­ta­tion of their util­ity func­tion for al­most the same rea­son that male or­ganisms are built to be pro­tec­tive of their tes­ti­cles. From the stand­point of the alien god, nat­u­ral se­lec­tion, los­ing the germline—the gene-car­rier that prop­a­gates the pat­tern into the next gen­er­a­tion—means los­ing al­most ev­ery­thing that nat­u­ral se­lec­tion cares about. Un­less you already have chil­dren to pro­tect, can pro­tect rel­a­tives, etcetera—few are the ab­solute and un­qual­ified state­ments that can be made in evolu­tion­ary biol­ogy—but still, if you hap­pen to be a male hu­man, you will find your­self rather pro­tec­tive of your tes­ti­cles; that one, cen­tral­ized vuln­er­a­bil­ity is why a kick in the tes­ti­cles hurts more than be­ing hit on the head.

To lose the pat­tern of hu­man value—which, for now, is phys­i­cally em­bod­ied only in the hu­man brains that care about those val­ues—would be to lose the Fu­ture it­self; if there’s no agent with those val­ues, there’s noth­ing to shape a valuable Fu­ture.

And this pat­tern, this one most vuln­er­a­ble and pre­cious pat­tern, is in­deed at risk to be dis­torted or de­stroyed. Grow­ing up is a hard prob­lem ei­ther way, whether you try to edit ex­ist­ing brains, or build de novo Ar­tifi­cial In­tel­li­gence that mir­rors hu­man val­ues. If some­thing more pow­er­ful than hu­mans, and not shar­ing hu­man val­ues, comes into ex­is­tence—whether by de novo AI gone wrong, or aug­mented hu­mans gone wrong—then we can ex­pect to lose, hard. And value is frag­ile; los­ing just one di­men­sion of hu­man value can de­stroy nearly all of the util­ity we ex­pect from the fu­ture.

So is that, then, the thing that I pro­tect?

If it were—then what in­spired me when times got tough would be, say, think­ing of peo­ple be­ing nice to each other. Or think­ing of peo­ple laugh­ing, and con­tem­plat­ing how hu­mor prob­a­bly ex­ists among only an in­finites­i­mal frac­tion of evolved in­tel­li­gent species and their de­scen­dants. I would mar­vel at the power of sym­pa­thy to make us feel what oth­ers feel -

But that’s not quite it ei­ther.

I once at­tended a small gath­er­ing whose theme was “This I Believe”. You could in­ter­pret that phrase in a num­ber of ways; I chose “What do you be­lieve that most other peo­ple don’t be­lieve which makes a cor­re­spond­ing differ­ence in your be­hav­ior?” And it seemed to me that most of how I be­haved differ­ently from other peo­ple boiled down to two un­usual be­liefs. The first be­lief could be sum­ma­rized as “in­tel­li­gence is a man­i­fes­ta­tion of or­der rather than chaos”; this ac­counts both for my at­tempts to mas­ter ra­tio­nal­ity, and my at­tempt to wield the power of AI.

And the sec­ond un­usual be­lief could be sum­ma­rized as: “Hu­man­ity’s fu­ture can be a WHOLE LOT bet­ter than its past.”

Not des­per­ately dar­wi­nian robots surg­ing out to eat as much of the cos­mos as pos­si­ble, mostly ig­nor­ing their own in­ter­nal val­ues to try and grab as many stars as pos­si­ble, with most of the re­main­ing mat­ter go­ing into mak­ing pa­per­clips.

Not some bit­ter­sweet end­ing where you and I fade away on Earth while the in­scrutable robots ride off into the un­know­able sun­set, hav­ing grown be­yond such merely hu­man val­ues as love or sym­pa­thy.

Screw bit­ter­sweet. To hell with that melan­choly-tinged crap. Why leave any­one be­hind? Why sur­ren­der a sin­gle thing that’s pre­cious?

(And the com­pro­mise-fu­tures are all fake any­way; at this difficulty level, you steer pre­cisely or you crash.)

The pat­tern of fun is also lawful. And, though I do not know all the law—I do think that writ­ten in hu­man­ity’s value-pat­terns is the im­plicit po­ten­tial of a happy fu­ture. A se­ri­ously god­damn FUN fu­ture. A gen­uinely GOOD out­come. Not some­thing you’d ac­cept with a sigh of res­ig­na­tion for noth­ing bet­ter be­ing pos­si­ble. Some­thing that would make you go “WOOHOO!”

In the se­quence on Fun The­ory, I have given you, I hope, some small rea­son to be­lieve that such a pos­si­bil­ity might be con­sis­tently de­scrib­able, if only it could be made real. How to read that po­ten­tial out of hu­mans and pro­ject it into re­al­ity… might or might not be as sim­ple as “su­per­pose our ex­trap­o­lated re­flected equil­ibria”. But that’s one way of look­ing at what I’m try­ing to do—to reach the po­ten­tial of the GOOD out­come, not the melan­choly bit­ter­sweet com­pro­mise. Why set­tle for less?

To re­ally have some­thing to pro­tect, it has to be able to bring tears to your eyes. That, gen­er­ally, re­quires some­thing con­crete to vi­su­al­ize—not just ab­stract laws. Read­ing the Laws of Fun doesn’t bring tears to my eyes. I can vi­su­al­ize a pos­si­bil­ity or two that makes sense to me, but I don’t know if it would make sense to oth­ers the same way.

What does bring tears to my eyes? Imag­in­ing a fu­ture where hu­man­ity has its act to­gether. Imag­in­ing chil­dren who grow up never know­ing our world, who don’t even un­der­stand it. Imag­in­ing the res­cue of those now in sor­row, the end of night­mares great and small. See­ing in re­al­ity the real sor­rows that hap­pen now, so many of which are un­nec­es­sary even now. See­ing in re­al­ity the signs of progress to­ward a hu­man­ity that’s at least try­ing to get its act to­gether and be­come some­thing more—even if the signs are mostly just sym­bolic: a space shut­tle launch, a march that protests a war.

(And of course these are not the only things that move me. Not ev­ery­thing that moves me has to be a Cause. When I’m listen­ing to e.g. Bach’s Jesu: Joy of Man’s De­siring, I don’t think about how ev­ery ex­tant copy might be va­por­ized if things go wrong. That may be true, but it’s not the point. It would be as bad as re­fus­ing to listen to that melody be­cause it was once in­spired by be­lief in the su­per­nat­u­ral.)

To re­ally have some­thing to pro­tect, you have to be able to pro­tect it, not just value it. My bat­tle­ground for that bet­ter Fu­ture is, in­deed, the frag­ile pat­tern of value. Not to keep it in sta­sis, but to keep it im­prov­ing un­der its own crite­ria rather than ran­domly los­ing in­for­ma­tion. And then to pro­ject that through more pow­er­ful op­ti­miza­tion, to ma­te­ri­al­ize the valuable fu­ture. Without sur­ren­der­ing a sin­gle thing that’s pre­cious, be­cause los­ing a sin­gle di­men­sion of value could lose it all.

There’s no easy way to do this, whether by de novo AI or by edit­ing brains. But with a de novo AI, cleanly and cor­rectly de­signed, I think it should at least be pos­si­ble to get it truly right and win com­pletely. It seems, for all its dan­ger, the safest and eas­iest and short­est way (yes, the al­ter­na­tives re­ally are that bad). And so that is my pro­ject.

That, then, is the ser­vice in which I wield ra­tio­nal­ity. To pro­tect the Fu­ture, on the bat­tle­ground of the phys­i­cal rep­re­sen­ta­tion of value. And my weapon, if I can mas­ter it, is the ul­ti­mate hid­den tech­nique of Bayescraft—to ex­plic­itly and fully know the struc­ture of ra­tio­nal­ity, to such an ex­tent that you can shape the pure form out­side your­self—what some call “Ar­tifi­cial Gen­eral In­tel­li­gence” and I call “Friendly AI”. Which is, it­self, a ma­jor un­solved re­search prob­lem, and so it calls into play the more in­for­mal meth­ods of merely hu­man ra­tio­nal­ity. That is the pur­pose of my art and the wellspring of my art.

That’s pretty much all I wanted to say here about this Sin­gu­lar­ity busi­ness...

...ex­cept for one last thing; so af­ter to­mor­row, I plan to go back to post­ing about plain old ra­tio­nal­ity on Mon­day.