No Univer­sally Com­pel­ling Arguments

What is so ter­ri­fy­ing about the idea that not every pos­sible mind might agree with us, even in prin­ciple?

For some folks, noth­ing—it doesn’t bother them in the slight­est. And for some of those folks, the reason it doesn’t bother them is that they don’t have strong in­tu­itions about stand­ards and truths that go bey­ond per­sonal whims. If they say the sky is blue, or that murder is wrong, that’s just their per­sonal opin­ion; and that someone else might have a dif­fer­ent opin­ion doesn’t sur­prise them.

For other folks, a dis­agree­ment that per­sists even in prin­ciple is some­thing they can’t ac­cept. And for some of those folks, the reason it both­ers them, is that it seems to them that if you al­low that some people can­not be per­suaded even in prin­ciple that the sky is blue, then you’re con­ced­ing that “the sky is blue” is merely an ar­bit­rary per­sonal opin­ion.

Yes­ter­day, I pro­posed that you should res­ist the tempta­tion to gen­er­al­ize over all of mind design space. If we re­strict ourselves to minds spe­cifiable in a tril­lion bits or less, then each uni­ver­sal gen­er­al­iz­a­tion “All minds m: X(m)” has two to the tril­lionth chances to be false, while each ex­ist­en­tial gen­er­al­iz­a­tion “Ex­ists mind m: X(m)” has two to the tril­lionth chances to be true.

This would seem to ar­gue that for every ar­gu­ment A, how­so­ever con­vin­cing it may seem to us, there ex­ists at least one pos­sible mind that doesn’t buy it.

And the sur­prise and/​or hor­ror of this pro­spect (for some) has a great deal to do, I think, with the in­tu­ition of the ghost-in-the-ma­chine—a ghost with some ir­re­du­cible core that any truly valid ar­gu­ment will con­vince.

I have pre­vi­ously spoken of the in­tu­ition whereby people map pro­gram­ming a com­puter, onto in­struct­ing a hu­man ser­vant, so that the com­puter might rebel against its code—or per­haps look over the code, de­cide it is not reas­on­able, and hand it back.

If there were a ghost in the ma­chine and the ghost con­tained an ir­re­du­cible core of reas­on­able­ness, above which any mere code was only a sug­ges­tion, then there might be uni­ver­sal ar­gu­ments. Even if the ghost was ini­tially handed code-sug­ges­tions that con­tra­dicted the Univer­sal Ar­gu­ment, then when we fi­nally did ex­pose the ghost to the Univer­sal Ar­gu­ment—or the ghost could dis­cover the Univer­sal Ar­gu­ment on its own, that’s also a pop­u­lar concept—the ghost would just over­ride its own, mis­taken source code.

But as the stu­dent pro­gram­mer once said, “I get the feel­ing that the com­puter just skips over all the com­ments.” The code is not given to the AI; the code is the AI.

If you switch to the phys­ical per­spect­ive, then the no­tion of a Univer­sal Ar­gu­ment seems no­tice­ably un­phys­ical. If there’s a phys­ical sys­tem that at time T, after be­ing ex­posed to ar­gu­ment E, does X, then there ought to be an­other phys­ical sys­tem that at time T, after be­ing ex­posed to en­vir­on­ment E, does Y. Any thought has to be im­ple­men­ted some­where, in a phys­ical sys­tem; any be­lief, any con­clu­sion, any de­cision, any mo­tor out­put. For every law­ful causal sys­tem that zigs at a set of points, you should be able to spe­cify an­other causal sys­tem that law­fully zags at the same points.

Let’s say there’s a mind with a tran­sistor that out­puts +3 volts at time T, in­dic­at­ing that it has just as­sen­ted to some per­suas­ive ar­gu­ment. Then we can build a highly sim­ilar phys­ical cog­nit­ive sys­tem with a tiny little trap­door un­der­neath the tran­sistor con­tain­ing a little grey man who climbs out at time T and sets that tran­sistor’s out­put to—3 volts, in­dic­at­ing non-as­sent. Noth­ing acausal about that; the little grey man is there be­cause we built him in. The no­tion of an ar­gu­ment that con­vinces any mind seems to in­volve a little blue wo­man who was never built into the sys­tem, who climbs out of lit­er­ally nowhere, and strangles the little grey man, be­cause that tran­sistor has just got to out­put +3 volts: It’s such a com­pel­ling ar­gu­ment, you see.

But com­pul­sion is not a prop­erty of ar­gu­ments, it is a prop­erty of minds that pro­cess ar­gu­ments.

So the reason I’m ar­guing against the ghost, isn’t just to make the point that (1) Friendly AI has to be ex­pli­citly pro­grammed and (2) the laws of phys­ics do not for­bid Friendly AI. (Though of course I take a cer­tain in­terest in es­tab­lish­ing this.)

I also wish to es­tab­lish the no­tion of a mind as a causal, law­ful, phys­ical sys­tem in which there is no ir­re­du­cible cent­ral ghost that looks over the neur­ons /​ code and de­cides whether they are good sug­ges­tions.

(There is a concept in Friendly AI of de­lib­er­ately pro­gram­ming an FAI to re­view its own source code and pos­sibly hand it back to the pro­gram­mers. But the mind that re­views is not ir­re­du­cible, it is just the mind that you cre­ated. The FAI is renor­mal­iz­ing it­self how­ever it was de­signed to do so; there is noth­ing acausal reach­ing in from out­side. A boot­strap, not a sky­hook.)

All this echoes back to the dis­cus­sion, a good deal earlier, of a Bayesian’s “ar­bit­rary” pri­ors. If you show me one Bayesian who draws 4 red balls and 1 white ball from a bar­rel, and who as­signs prob­ab­il­ity 57 to ob­tain­ing a red ball on the next oc­ca­sion (by La­place’s Rule of Suc­ces­sion), then I can show you an­other mind which obeys Bayes’s Rule to con­clude a 27 prob­ab­il­ity of ob­tain­ing red on the next oc­ca­sion—cor­res­pond­ing to a dif­fer­ent prior be­lief about the bar­rel, but, per­haps, a less “reas­on­able” one.

Many philo­soph­ers are con­vinced that be­cause you can in-prin­ciple con­struct a prior that up­dates to any given con­clu­sion on a stream of evid­ence, there­fore, Bayesian reas­on­ing must be “ar­bit­rary”, and the whole schema of Bayesian­ism flawed, be­cause it re­lies on “un­jus­ti­fi­able” as­sump­tions, and in­deed “un­scientific”, be­cause you can­not force any pos­sible journal ed­itor in mind­space to agree with you.

And this (I then replied) re­lies on the no­tion that by un­wind­ing all ar­gu­ments and their jus­ti­fic­a­tions, you can ob­tain an ideal philo­sophy stu­dent of per­fect empti­ness, to be con­vinced by a line of reas­on­ing that be­gins from ab­so­lutely no as­sump­tions.

But who is this ideal philo­sopher of per­fect empti­ness? Why, it is just the ir­re­du­cible core of the ghost!

And that is why (I went on to say) the res­ult of try­ing to re­move all as­sump­tions from a mind, and un­wind to the per­fect ab­sence of any prior, is not an ideal philo­sopher of per­fect empti­ness, but a rock. What is left of a mind after you re­move the source code? Not the ghost who looks over the source code, but simply… no ghost.

So—and I shall take up this theme again later—wherever you are to loc­ate your no­tions of valid­ity or worth or ra­tion­al­ity or jus­ti­fic­a­tion or even ob­jectiv­ity, it can­not rely on an ar­gu­ment that is uni­ver­sally com­pel­ling to all phys­ic­ally pos­sible minds.

Nor can you ground valid­ity in a se­quence of jus­ti­fic­a­tions that, be­gin­ning from noth­ing, per­suades a per­fect empti­ness.

Oh, there might be ar­gu­ment se­quences that would com­pel any neur­o­lo­gic­ally in­tact hu­man—like the ar­gu­ment I use to make people let the AI out of the box1—but that is hardly the same thing from a philo­soph­ical per­spect­ive.

The first great fail­ure of those who try to con­sider Friendly AI, is the One Great Moral Prin­ciple That Is All We Need To Pro­gram—aka the fake util­ity func­tion—and of this I have already spoken.

But the even worse fail­ure is the One Great Moral Prin­ciple We Don’t Even Need To Pro­gram Be­cause Any AI Must Inevit­ably Con­clude It. This no­tion ex­erts a ter­ri­fy­ing un­healthy fas­cin­a­tion on those who spon­tan­eously re­in­vent it; they dream of com­mands that no suf­fi­ciently ad­vanced mind can dis­obey. The gods them­selves will pro­claim the right­ness of their philo­sophy! (E.g. John C. Wright, Marc Geddes.)

There is also a less severe ver­sion of the fail­ure, where the one does not de­clare the One True Mor­al­ity. Rather the one hopes for an AI cre­ated per­fectly free, un­con­strained by flawed hu­mans de­sir­ing slaves, so that the AI may ar­rive at vir­tue of its own ac­cord—vir­tue un­dreamed-of per­haps by the speaker, who con­fesses them­selves too flawed to teach an AI. (E.g. John K Clark, Richard Hollerith?, Eliezer1996.) This is a less tain­ted motive than the dream of ab­so­lute com­mand. But though this dream arises from vir­tue rather than vice, it is still based on a flawed un­der­stand­ing of free­dom, and will not ac­tu­ally work in real life. Of this, more to fol­low, of course.

John C. Wright, who was pre­vi­ously writ­ing a very nice transhuman­ist tri­logy (first book: The Golden Age) in­ser­ted a huge Author Fili­buster in the middle of his cli­mactic third book, de­scrib­ing in tens of pages his Univer­sal Mor­al­ity That Must Per­suade Any AI. I don’t know if any­thing happened after that, be­cause I stopped read­ing. And then Wright con­ver­ted to Chris­tian­ity—yes, ser­i­ously. So you really don’t want to fall into this trap!


Foot­note 1: Just kid­ding.