No Universally Compelling Arguments

What is so ter­rify­ing about the idea that not ev­ery pos­si­ble mind might agree with us, even in prin­ci­ple?

For some folks, noth­ing—it doesn’t bother them in the slight­est. And for some of those folks, the rea­son it doesn’t bother them is that they don’t have strong in­tu­itions about stan­dards and truths that go be­yond per­sonal whims. If they say the sky is blue, or that mur­der is wrong, that’s just their per­sonal opinion; and that some­one else might have a differ­ent opinion doesn’t sur­prise them.

For other folks, a dis­agree­ment that per­sists even in prin­ci­ple is some­thing they can’t ac­cept. And for some of those folks, the rea­son it both­ers them, is that it seems to them that if you al­low that some peo­ple can­not be per­suaded even in prin­ci­ple that the sky is blue, then you’re con­ced­ing that “the sky is blue” is merely an ar­bi­trary per­sonal opinion.

Yes­ter­day, I pro­posed that you should re­sist the temp­ta­tion to gen­er­al­ize over all of mind de­sign space. If we re­strict our­selves to minds speci­fi­able in a trillion bits or less, then each uni­ver­sal gen­er­al­iza­tion “All minds m: X(m)” has two to the trillionth chances to be false, while each ex­is­ten­tial gen­er­al­iza­tion “Ex­ists mind m: X(m)” has two to the trillionth chances to be true.

This would seem to ar­gue that for ev­ery ar­gu­ment A, how­so­ever con­vinc­ing it may seem to us, there ex­ists at least one pos­si­ble mind that doesn’t buy it.

And the sur­prise and/​or hor­ror of this prospect (for some) has a great deal to do, I think, with the in­tu­ition of the ghost-in-the-ma­chine—a ghost with some ir­re­ducible core that any truly valid ar­gu­ment will con­vince.

I have pre­vi­ously spo­ken of the in­tu­ition whereby peo­ple map pro­gram­ming a com­puter, onto in­struct­ing a hu­man ser­vant, so that the com­puter might rebel against its code—or per­haps look over the code, de­cide it is not rea­son­able, and hand it back.

If there were a ghost in the ma­chine and the ghost con­tained an ir­re­ducible core of rea­son­able­ness, above which any mere code was only a sug­ges­tion, then there might be uni­ver­sal ar­gu­ments. Even if the ghost was ini­tially handed code-sug­ges­tions that con­tra­dicted the Univer­sal Ar­gu­ment, then when we fi­nally did ex­pose the ghost to the Univer­sal Ar­gu­ment—or the ghost could dis­cover the Univer­sal Ar­gu­ment on its own, that’s also a pop­u­lar con­cept—the ghost would just over­ride its own, mis­taken source code.

But as the stu­dent pro­gram­mer once said, “I get the feel­ing that the com­puter just skips over all the com­ments.” The code is not given to the AI; the code is the AI.

If you switch to the phys­i­cal per­spec­tive, then the no­tion of a Univer­sal Ar­gu­ment seems no­tice­ably un­phys­i­cal. If there’s a phys­i­cal sys­tem that at time T, af­ter be­ing ex­posed to ar­gu­ment E, does X, then there ought to be an­other phys­i­cal sys­tem that at time T, af­ter be­ing ex­posed to en­vi­ron­ment E, does Y. Any thought has to be im­ple­mented some­where, in a phys­i­cal sys­tem; any be­lief, any con­clu­sion, any de­ci­sion, any mo­tor out­put. For ev­ery lawful causal sys­tem that zigs at a set of points, you should be able to spec­ify an­other causal sys­tem that lawfully zags at the same points.

Let’s say there’s a mind with a tran­sis­tor that out­puts +3 volts at time T, in­di­cat­ing that it has just as­sented to some per­sua­sive ar­gu­ment. Then we can build a highly similar phys­i­cal cog­ni­tive sys­tem with a tiny lit­tle trap­door un­der­neath the tran­sis­tor con­tain­ing a lit­tle grey man who climbs out at time T and sets that tran­sis­tor’s out­put to—3 volts, in­di­cat­ing non-as­sent. Noth­ing acausal about that; the lit­tle grey man is there be­cause we built him in. The no­tion of an ar­gu­ment that con­vinces any mind seems to in­volve a lit­tle blue woman who was never built into the sys­tem, who climbs out of liter­ally nowhere, and stran­gles the lit­tle grey man, be­cause that tran­sis­tor has just got to out­put +3 volts: It’s such a com­pel­ling ar­gu­ment, you see.

But com­pul­sion is not a prop­erty of ar­gu­ments, it is a prop­erty of minds that pro­cess ar­gu­ments.

So the rea­son I’m ar­gu­ing against the ghost, isn’t just to make the point that (1) Friendly AI has to be ex­plic­itly pro­grammed and (2) the laws of physics do not for­bid Friendly AI. (Though of course I take a cer­tain in­ter­est in es­tab­lish­ing this.)

I also wish to es­tab­lish the no­tion of a mind as a causal, lawful, phys­i­cal sys­tem in which there is no ir­re­ducible cen­tral ghost that looks over the neu­rons /​ code and de­cides whether they are good sug­ges­tions.

(There is a con­cept in Friendly AI of de­liber­ately pro­gram­ming an FAI to re­view its own source code and pos­si­bly hand it back to the pro­gram­mers. But the mind that re­views is not ir­re­ducible, it is just the mind that you cre­ated. The FAI is renor­mal­iz­ing it­self how­ever it was de­signed to do so; there is noth­ing acausal reach­ing in from out­side. A boot­strap, not a sky­hook.)

All this echoes back to the dis­cus­sion, a good deal ear­lier, of a Bayesian’s “ar­bi­trary” pri­ors. If you show me one Bayesian who draws 4 red balls and 1 white ball from a bar­rel, and who as­signs prob­a­bil­ity 57 to ob­tain­ing a red ball on the next oc­ca­sion (by Laplace’s Rule of Suc­ces­sion), then I can show you an­other mind which obeys Bayes’s Rule to con­clude a 27 prob­a­bil­ity of ob­tain­ing red on the next oc­ca­sion—cor­re­spond­ing to a differ­ent prior be­lief about the bar­rel, but, per­haps, a less “rea­son­able” one.

Many philoso­phers are con­vinced that be­cause you can in-prin­ci­ple con­struct a prior that up­dates to any given con­clu­sion on a stream of ev­i­dence, there­fore, Bayesian rea­son­ing must be “ar­bi­trary”, and the whole schema of Bayesi­anism flawed, be­cause it re­lies on “un­jus­tifi­able” as­sump­tions, and in­deed “un­scien­tific”, be­cause you can­not force any pos­si­ble jour­nal ed­i­tor in mindspace to agree with you.

And this (I then replied) re­lies on the no­tion that by un­wind­ing all ar­gu­ments and their jus­tifi­ca­tions, you can ob­tain an ideal philos­o­phy stu­dent of perfect empti­ness, to be con­vinced by a line of rea­son­ing that be­gins from ab­solutely no as­sump­tions.

But who is this ideal philoso­pher of perfect empti­ness? Why, it is just the ir­re­ducible core of the ghost!

And that is why (I went on to say) the re­sult of try­ing to re­move all as­sump­tions from a mind, and un­wind to the perfect ab­sence of any prior, is not an ideal philoso­pher of perfect empti­ness, but a rock. What is left of a mind af­ter you re­move the source code? Not the ghost who looks over the source code, but sim­ply… no ghost.

So—and I shall take up this theme again later—wher­ever you are to lo­cate your no­tions of val­idity or worth or ra­tio­nal­ity or jus­tifi­ca­tion or even ob­jec­tivity, it can­not rely on an ar­gu­ment that is uni­ver­sally com­pel­ling to all phys­i­cally pos­si­ble minds.

Nor can you ground val­idity in a se­quence of jus­tifi­ca­tions that, be­gin­ning from noth­ing, per­suades a perfect empti­ness.

Oh, there might be ar­gu­ment se­quences that would com­pel any neu­rolog­i­cally in­tact hu­man—like the ar­gu­ment I use to make peo­ple let the AI out of the box1—but that is hardly the same thing from a philo­soph­i­cal per­spec­tive.

The first great failure of those who try to con­sider Friendly AI, is the One Great Mo­ral Prin­ci­ple That Is All We Need To Pro­gram—aka the fake util­ity func­tion—and of this I have already spo­ken.

But the even worse failure is the One Great Mo­ral Prin­ci­ple We Don’t Even Need To Pro­gram Be­cause Any AI Must Inevitably Con­clude It. This no­tion ex­erts a ter­rify­ing un­healthy fas­ci­na­tion on those who spon­ta­neously rein­vent it; they dream of com­mands that no suffi­ciently ad­vanced mind can di­s­obey. The gods them­selves will pro­claim the right­ness of their philos­o­phy! (E.g. John C. Wright, Marc Ged­des.)

There is also a less se­vere ver­sion of the failure, where the one does not de­clare the One True Mo­ral­ity. Rather the one hopes for an AI cre­ated perfectly free, un­con­strained by flawed hu­mans de­siring slaves, so that the AI may ar­rive at virtue of its own ac­cord—virtue un­dreamed-of per­haps by the speaker, who con­fesses them­selves too flawed to teach an AI. (E.g. John K Clark, Richard Hol­ler­ith?, Eliezer1996.) This is a less tainted mo­tive than the dream of ab­solute com­mand. But though this dream arises from virtue rather than vice, it is still based on a flawed un­der­stand­ing of free­dom, and will not ac­tu­ally work in real life. Of this, more to fol­low, of course.

John C. Wright, who was pre­vi­ously writ­ing a very nice tran­shu­man­ist tril­ogy (first book: The Golden Age) in­serted a huge Author Fili­buster in the mid­dle of his cli­mac­tic third book, de­scribing in tens of pages his Univer­sal Mo­ral­ity That Must Per­suade Any AI. I don’t know if any­thing hap­pened af­ter that, be­cause I stopped read­ing. And then Wright con­verted to Chris­ti­an­ity—yes, se­ri­ously. So you re­ally don’t want to fall into this trap!

Foot­note 1: Just kid­ding.