Ends Don’t Justify Means (Among Humans)

“If the ends don’t jus­tify the means, what does?”
—var­i­ously attributed

“I think of my­self as run­ning on hos­tile hard­ware.”
—Justin Corwin

Yes­ter­day I talked about how hu­mans may have evolved a struc­ture of poli­ti­cal rev­olu­tion, be­gin­ning by be­liev­ing them­selves morally su­pe­rior to the cor­rupt cur­rent power struc­ture, but end­ing by be­ing cor­rupted by power them­selves—not by any plan in their own minds, but by the echo of an­ces­tors who did the same and thereby re­pro­duced.

This fits the tem­plate:

In some cases, hu­man be­ings have evolved in such fash­ion as to think that they are do­ing X for proso­cial rea­son Y, but when hu­man be­ings ac­tu­ally do X, other adap­ta­tions ex­e­cute to pro­mote self-benefit­ing con­se­quence Z.

From this propo­si­tion, I now move on to my main point, a ques­tion con­sid­er­ably out­side the realm of clas­si­cal Bayesian de­ci­sion the­ory:

“What if I’m run­ning on cor­rupted hard­ware?”

In such a case as this, you might even find your­self ut­ter­ing such seem­ingly para­dox­i­cal state­ments—sheer non­sense from the per­spec­tive of clas­si­cal de­ci­sion the­ory—as:

“The ends don’t jus­tify the means.”

But if you are run­ning on cor­rupted hard­ware, then the re­flec­tive ob­ser­va­tion that it seems like a righ­teous and al­tru­is­tic act to seize power for your­self—this seem­ing may not be be much ev­i­dence for the propo­si­tion that seiz­ing power is in fact the ac­tion that will most benefit the tribe.

By the power of naive re­al­ism, the cor­rupted hard­ware that you run on, and the cor­rupted seem­ings that it com­putes, will seem like the fabric of the very world it­self—sim­ply the way-things-are.

And so we have the bizarre-seem­ing rule: “For the good of the tribe, do not cheat to seize power even when it would provide a net benefit to the tribe.

In­deed it may be wiser to phrase it this way: If you just say, “when it seems like it would provide a net benefit to the tribe”, then you get peo­ple who say, “But it doesn’t just seem that way—it would provide a net benefit to the tribe if I were in charge.”

The no­tion of un­trusted hard­ware seems like some­thing wholly out­side the realm of clas­si­cal de­ci­sion the­ory. (What it does to re­flec­tive de­ci­sion the­ory I can’t yet say, but that would seem to be the ap­pro­pri­ate level to han­dle it.)

But on a hu­man level, the patch seems straight­for­ward. Once you know about the warp, you cre­ate rules that de­scribe the warped be­hav­ior and out­law it. A rule that says, “For the good of the tribe, do not cheat to seize power even for the good of the tribe.” Or “For the good of the tribe, do not mur­der even for the good of the tribe.”

And now the philoso­pher comes and pre­sents their “thought ex­per­i­ment”—set­ting up a sce­nario in which, by stipu­la­tion, the only pos­si­ble way to save five in­no­cent lives is to mur­der one in­no­cent per­son, and this mur­der is cer­tain to save the five lives. “There’s a train head­ing to run over five in­no­cent peo­ple, who you can’t pos­si­bly warn to jump out of the way, but you can push one in­no­cent per­son into the path of the train, which will stop the train. Th­ese are your only op­tions; what do you do?”

An al­tru­is­tic hu­man, who has ac­cepted cer­tain de­on­tolog­i­cal pro­hibits—which seem well jus­tified by some his­tor­i­cal statis­tics on the re­sults of rea­son­ing in cer­tain ways on un­trust­wor­thy hard­ware—may ex­pe­rience some men­tal dis­tress, on en­coun­ter­ing this thought ex­per­i­ment.

So here’s a re­ply to that philoso­pher’s sce­nario, which I have yet to hear any philoso­pher’s vic­tim give:

“You stipu­late that the only pos­si­ble way to save five in­no­cent lives is to mur­der one in­no­cent per­son, and this mur­der will definitely save the five lives, and that these facts are known to me with effec­tive cer­tainty. But since I am run­ning on cor­rupted hard­ware, I can’t oc­cupy the epistemic state you want me to imag­ine. There­fore I re­ply that, in a so­ciety of Ar­tifi­cial In­tel­li­gences wor­thy of per­son­hood and lack­ing any in­built ten­dency to be cor­rupted by power, it would be right for the AI to mur­der the one in­no­cent per­son to save five, and more­over all its peers would agree. How­ever, I re­fuse to ex­tend this re­ply to my­self, be­cause the epistemic state you ask me to imag­ine, can only ex­ist among other kinds of peo­ple than hu­man be­ings.”

Now, to me this seems like a dodge. I think the uni­verse is suffi­ciently un­kind that we can justly be forced to con­sider situ­a­tions of this sort. The sort of per­son who goes around propos­ing that sort of thought ex­per­i­ment, might well de­serve that sort of an­swer. But any hu­man le­gal sys­tem does em­body some an­swer to the ques­tion “How many in­no­cent peo­ple can we put in jail to get the guilty ones?”, even if the num­ber isn’t writ­ten down.

As a hu­man, I try to abide by the de­on­tolog­i­cal pro­hi­bi­tions that hu­mans have made to live in peace with one an­other. But I don’t think that our de­on­tolog­i­cal pro­hi­bi­tions are liter­ally in­her­ently non­con­se­quen­tially ter­mi­nally right. I en­dorse “the end doesn’t jus­tify the means” as a prin­ci­ple to guide hu­mans run­ning on cor­rupted hard­ware, but I wouldn’t en­dorse it as a prin­ci­ple for a so­ciety of AIs that make well-cal­ibrated es­ti­mates. (If you have one AI in a so­ciety of hu­mans, that does bring in other con­sid­er­a­tions, like whether the hu­mans learn from your ex­am­ple.)

And so I wouldn’t say that a well-de­signed Friendly AI must nec­es­sar­ily re­fuse to push that one per­son off the ledge to stop the train. Ob­vi­ously, I would ex­pect any de­cent su­per­in­tel­li­gence to come up with a su­pe­rior third al­ter­na­tive. But if those are the only two al­ter­na­tives, and the FAI judges that it is wiser to push the one per­son off the ledge—even af­ter tak­ing into ac­count knock-on effects on any hu­mans who see it hap­pen and spread the story, etc.—then I don’t call it an alarm light, if an AI says that the right thing to do is sac­ri­fice one to save five. Again, I don’t go around push­ing peo­ple into the paths of trains my­self, nor steal­ing from banks to fund my al­tru­is­tic pro­jects. I hap­pen to be a hu­man. But for a Friendly AI to be cor­rupted by power would be like it start­ing to bleed red blood. The ten­dency to be cor­rupted by power is a spe­cific biolog­i­cal adap­ta­tion, sup­ported by spe­cific cog­ni­tive cir­cuits, built into us by our genes for a clear evolu­tion­ary rea­son. It wouldn’t spon­ta­neously ap­pear in the code of a Friendly AI any more than its tran­sis­tors would start to bleed.

I would even go fur­ther, and say that if you had minds with an in­built warp that made them over­es­ti­mate the ex­ter­nal harm of self-benefit­ing ac­tions, then they would need a rule “the ends do not pro­hibit the means”—that you should do what benefits your­self even when it (seems to) harm the tribe. By hy­poth­e­sis, if their so­ciety did not have this rule, the minds in it would re­fuse to breathe for fear of us­ing some­one else’s oxy­gen, and they’d all die. For them, an oc­ca­sional over­shoot in which one per­son seizes a per­sonal benefit at the net ex­pense of so­ciety, would seem just as cau­tiously vir­tu­ous—and in­deed be just as cau­tiously vir­tu­ous—as when one of us hu­mans, be­ing cau­tious, passes up an op­por­tu­nity to steal a loaf of bread that re­ally would have been more of a benefit to them than a loss to the mer­chant (in­clud­ing knock-on effects).

“The end does not jus­tify the means” is just con­se­quen­tial­ist rea­son­ing at one meta-level up. If a hu­man starts think­ing on the ob­ject level that the end jus­tifies the means, this has awful con­se­quences given our un­trust­wor­thy brains; there­fore a hu­man shouldn’t think this way. But it is all still ul­ti­mately con­se­quen­tial­ism. It’s just re­flec­tive con­se­quen­tial­ism, for be­ings who know that their mo­ment-by-mo­ment de­ci­sions are made by un­trusted hard­ware.