Free to Optimize

Stare de­ci­sis is the le­gal prin­ci­ple which binds courts to fol­low prece­dent, re­trace the foot­steps of other judges’ de­ci­sions. As some­one pre­vi­ously con­demned to an Ortho­dox Jewish ed­u­ca­tion, where I grit­ted my teeth at the idea that me­dieval rab­bis would always be wiser than mod­ern rab­bis, I com­pletely missed the ra­tio­nale for stare de­ci­sis. I thought it was about re­spect for the past.

But shouldn’t we pre­sume that, in the pres­ence of sci­ence, judges closer to the fu­ture will know more—have new facts at their finger­tips—which en­able them to make bet­ter de­ci­sions? Imag­ine if en­g­ineers re­spected the de­ci­sions of past en­g­ineers, not as a source of good sug­ges­tions, but as a bind­ing prece­dent!—That was my origi­nal re­ac­tion. The stan­dard ra­tio­nale be­hind stare de­ci­sis came as a shock of rev­e­la­tion to me; it con­sid­er­ably in­creased my re­spect for the whole le­gal sys­tem.

This ra­tio­nale is ju­rispru­dence con­stante: The le­gal sys­tem must above all be pre­dictable, so that peo­ple can ex­e­cute con­tracts or choose be­hav­iors know­ing the le­gal im­pli­ca­tions.

Judges are not nec­es­sar­ily there to op­ti­mize, like an en­g­ineer. The pur­pose of law is not to make the world perfect. The law is there to provide a pre­dictable en­vi­ron­ment in which peo­ple can op­ti­mize their own­fu­tures.

I was amazed at how a prin­ci­ple that at first glance seemed so com­pletely Lud­dite, could have such an En­light­en­ment ra­tio­nale. It was a “shock of cre­ativity”—a solu­tion that ranked high in my prefer­ence or­der­ing and low in my search or­der­ing, a solu­tion that vi­o­lated my pre­vi­ous sur­face gen­er­al­iza­tions. “Re­spect the past just be­cause it’s the past” would not have eas­ily oc­curred to me as a good solu­tion for any­thing.

There’s a peer com­men­tary in Evolu­tion­ary Ori­gins of Mo­ral­ity which notes in pass­ing that “other things be­ing equal, or­ganisms will choose to re­ward them­selves over be­ing re­warded by care­tak­ing or­ganisms”. It’s cited as the Premack prin­ci­ple, but the ac­tual Premack prin­ci­ple looks to be some­thing quite differ­ent, so I don’t know if this is a bo­gus re­sult, a mis­re­mem­bered cita­tion, or a nonob­vi­ous deriva­tion. If true, it’s definitely in­ter­est­ing from a fun-the­o­retic per­spec­tive.

Op­ti­miza­tion is the abil­ity to squeeze the fu­ture into re­gions high in your prefer­ence or­der­ing. Liv­ing by my own strength, means squeez­ing my own fu­ture—not perfectly, but still be­ing able to grasp some of the re­la­tion be­tween my ac­tions and their con­se­quences. This is the strength of a hu­man.

If I’m be­ing helped, then some other agent is also squeez­ing my fu­ture—op­ti­miz­ing me—in the same rough di­rec­tion that I try to squeeze my­self. This is “help”.

A hu­man helper is un­likely to steer ev­ery part of my fu­ture that I could have steered my­self. They’re not likely to have already ex­ploited ev­ery con­nec­tion be­tween ac­tion and out­come that I can my­self un­der­stand. They won’t be able to squeeze the fu­ture that tightly; there will be slack left over, that I can squeeze for my­self.

We have lit­tle ex­pe­rience with be­ing “care­taken” across any sub­stan­tial gap in in­tel­li­gence; the clos­est thing that hu­man ex­pe­rience pro­vides us with is the idiom of par­ents and chil­dren. Hu­man par­ents are still hu­man; they may be smarter than their chil­dren, but they can’t pre­dict the fu­ture or ma­nipu­late the kids in any fine-grained way.

Even so, it’s an em­piri­cal ob­ser­va­tion that some hu­man par­ents do­help their chil­dren so much that their chil­dren don’t be­come strong. It’s not that there’s noth­ing left for their chil­dren to do, but with a hun­dred mil­lion dol­lars in a trust fund, they don’t need to do much—their re­main­ing mo­ti­va­tions aren’t strong enough. Some­thing like that de­pends on genes, not just en­vi­ron­ment —not ev­ery over­helped child shrivels—but con­versely it de­pends on en­vi­ron­ment too, not just genes.

So, in con­sid­er­ing the kind of “help” that can flow from rel­a­tively stronger agents to rel­a­tively weaker agents, we have two po­ten­tial prob­lems to track:

  1. Help so strong that it op­ti­mizes away the links be­tween the de­sir­able out­come and your own choices.

  2. Help that is be­lievedto be so re­li­able, that it takes off the psy­cholog­i­cal pres­sure to use your own strength.

Since (2) re­volves around be­lief, could you just lie about how re­li­able the help was? Pre­tend that you’re not go­ing to help when things get bad—but then if things do get bad, you help any­way? That trick didn’t work too well for Alan Greenspan and Ben Ber­nanke.

A su­per­in­tel­li­gence might be able to pull off a bet­ter de­cep­tion. But in terms of moral the­ory and eu­daimo­nia—we are­al­lowed to have prefer­ences over ex­ter­nal states of af­fairs, not just psy­cholog­i­cal states. This ap­plies to “I want to re­ally steer my own life, not just be­lieve that I do”, just as it ap­plies to “I want to have a love af­fair with a fel­low sen­tient, not just a pup­pet that I am de­ceived into think­ing sen­tient”. So if we can state firmly from a value stand­point that we don’t want to be fooled this way, then build­in­gan agent which re­spects that prefer­ence is a mere mat­ter of Friendly AI.

Mod­ify peo­ple so that they don’t re­lax when they be­lieve they’ll be helped? I usu­ally try to think of how to mod­ify en­vi­ron­ments be­fore I imag­ine mod­ify­ing any peo­ple. It’s not that I want to stay the same per­son for­ever; but the is­sues are rather more fraught, and one might wish to take it slowly, at some eu­daimonic rate of per­sonal im­prove­ment.

(1), though, is the most in­ter­est­ing is­sue from a philo­soph­i­cal­ish stand­point. It im­p­inges on the con­fu­sion named “free will”. Of which I have already un­tan­gled; see the posts refer­enced at top, if you’re re­cently join­ing OB.

Let’s say that I’m an ul­tra­pow­er­ful AI, and I use my knowl­edge of your mind and your en­vi­ron­ment to fore­cast that, if left to your own de­vices, you will make $999,750. But this does not satis­fice me; it so hap­pens that I want you to make at least $1,000,000. So I hand you $250, and then you go on to make $999,750 as you or­di­nar­ily would have.

How much of your own strength have you just lived by?

The first view would say, “I made 99.975% of the money; the AI only helped 0.025% worth.”

The sec­ond view would say, “Sup­pose I had en­tirely slacked off and done noth­ing. Then the AI would have handed me $1,000,000. So my at­tempt to steer my own fu­ture was an illu­sion; my fu­ture was already de­ter­mined to con­tain $1,000,000.”

Some­one might re­ply, “Physics is de­ter­minis­tic, so your fu­ture is already de­ter­mined no mat­ter what you or the AI does—”

But the sec­ond view in­ter­rupts and says, “No, you’re not con­fus­ing me that eas­ily. I am within physics, so in or­der for my fu­ture to be de­ter­mined by me, it must be de­ter­mined by physics. The Past does not reach around the Pre­sent and de­ter­mine the Fu­ture be­fore the Pre­sent gets a chance—that is mix­ing up a time­ful view with a time­less one. But if there’s an AI that re­ally does look over the al­ter­na­tives be­fore I do, and re­ally does choose the out­come be­fore I get a chance, then I’m re­ally not steer­ing my own fu­ture. The fu­ture is no longer coun­ter­fac­tu­ally de­pen­dent on my de­ci­sions.”

At which point the first view butts in and says, “But of course the fu­ture is coun­ter­fac­tu­ally de­pen­dent on your ac­tions. The AI gives you $250 and then leaves. As a phys­i­cal fact, if you didn’t work hard, you would end up with only $250 in­stead of $1,000,000.”

To which the sec­ond view replies, “I one-box on New­comb’s Prob­lem, so my coun­ter­fac­tual reads ‘if my de­ci­sion were to not work hard, the AI would have given me $1,000,000 in­stead of $250’.”

“So you’re say­ing,” says the first view, heavy with sar­casm, “that if the AI had wanted me to make at least $1,000,000 and it had en­sured this through the gen­eral policy of hand­ing me $1,000,000 flat on a silver plat­ter, leav­ing me to earn $999,750 through my own ac­tions, for a to­tal of $1,999,750—that this AI would have in­terfered less­with my life than the one who just gave me $250.”

The sec­ond view thinks for a sec­ond and says “Yeah, ac­tu­ally. Be­cause then there’s a stronger coun­ter­fac­tual de­pen­dency of the fi­nal out­come on your own de­ci­sions. Every dol­lar you earned was a real added dol­lar. The sec­ond AI helped you more, but it con­strained your des­tiny less.”

“But if the AI had done ex­actly the same thing, be­cause it want­edme to make ex­actly $1,999,750—”

The sec­ond view nods.

“That sounds a bit scary,” the first view says, “for rea­sons which have noth­ing to do with the usual fu­ri­ous de­bates over New­comb’s Prob­lem. You’re mak­ing your util­ity func­tion path-de­pen­dent on the de­tailed cog­ni­tion of the Friendly AI try­ing to help you! You’d be okay with it if the AI only could give you $250. You’d be okay if the AI had de­cided to give you $250 through a de­ci­sion pro­cess that had pre­dicted the fi­nal out­come in less de­tail, even though you ac­knowl­edge that in prin­ci­ple your de­ci­sions may already be highly de­ter­minis­tic. How is a poor Friendly AI sup­posed to help you, when your util­ity func­tion is de­pen­dent, not just on the out­come, not just on the Friendly AI’s ac­tions, but de­pen­dent on differ­ences of the ex­act al­gorithm the Friendly AI uses to ar­rive at the same de­ci­sion? Isn’t your whole ra­tio­nale of one-box­ing on New­comb’s Prob­lem that you only care about what works?”

“Well, that’s a good point,” says the sec­ond view. “But some­times we only care about what works, and yet some­times we do care about the jour­ney as well as the des­ti­na­tion. If I was try­ing to cure can­cer, I wouldn’t care how I cured can­cer, or whether I or the AI cured can­cer, just so long as it ended up cured. This isn’t that kind of prob­lem. This is the prob­lem of the eu­daimonic jour­ney—it’s the rea­son I care in the first place whether I get a mil­lion dol­lars through my own efforts or by hav­ing an out­side AI hand it to me on a silver plat­ter. My util­ity func­tion is not up for grabs. If I de­sire not to be op­ti­mized too hard by an out­side agent, the agent needs to re­spect that prefer­ence even if it de­pends on the de­tails of how the out­side agent ar­rives at its de­ci­sions. Though it’s also worth not­ing that de­ci­sions are­pro­duced by al­gorithms— if the AI hadn’t been us­ing the al­gorithm of do­ing just what it took to bring me up to $1,000,000, it prob­a­bly wouldn’t have handed me ex­actly $250.”

The de­sire not to be op­ti­mized too hard by an out­side agent is one of the struc­turally non­triv­ial as­pects of hu­man moral­ity.

But I can think of a solu­tion, which un­less it con­tains some ter­rible flaw not ob­vi­ous to me, sets a lower bound on the good­ness of a solu­tion: any al­ter­na­tive solu­tion adopted, ought to be at least this good or bet­ter.

If there is any­thing in the world that re­sem­bles a god, peo­ple will try to pray to it. It’s hu­man na­ture to such an ex­tent that peo­ple will pray even if there aren’t any gods—so you can imag­ine what would hap­pen if there were! But peo­ple don’t pray to grav­ity to ig­nore their air­planes, be­cause it is un­der­stood how grav­ity works, and it is un­der­stood that grav­ity doesn’t adapt it­self to the needs of in­di­vi­d­u­als. In­stead they un­der­stand grav­ity and try to turn it to their own pur­poses.

So one pos­si­ble way of helping—which may or may not be the best way of helping—would be the gift of a world that works on im­proved rules, where the rules are sta­ble and un­der­stand­able enough that peo­ple can ma­nipu­late them and op­ti­mize their own fu­tures to­gether. A nicer place to live, but free of med­dling gods be­yond that. I have yet to think of a form of help that is less poi­sonous to hu­man be­ings—but I am only hu­man.

Added: Note that mod­ern le­gal sys­tems score a low Fail on this di­men­sion—no sin­gle hu­man mind can even know all the reg­u­la­tions any more, let alone op­ti­mize for them. Maybe a pro­fes­sional lawyer who did noth­ing else could mem­o­rize all the reg­u­la­tions ap­pli­ca­ble to them per­son­ally, but I doubt it. As Albert Ein­stein ob­served, any fool can make things more com­pli­cated; what takes in­tel­li­gence is mov­ing in the op­po­site di­rec­tion.

Part of The Fun The­ory Sequence

Next post: “Harm­ful Op­tions

Pre­vi­ous post: “Liv­ing By Your Own Strength