A taxonomy of Oracle AIs

Sources: An old draft on Or­a­cle AI from Daniel Dewey, con­ver­sa­tion with Dewey and Nick Beck­stead. See also Think­ing In­side the Box and Leakproofing the Sin­gu­lar­ity.

Can we just cre­ate an Or­a­cle AI that in­forms us but doesn’t do any­thing?

“Or­a­cle AI” has been pro­posed in many forms, but most pro­pos­als share a com­mon thread: a pow­er­ful AI is not dan­ger­ous if it doesn’t “want to do any­thing”, the ar­gu­ment goes, and there­fore, it should be pos­si­ble to cre­ate a safe “Or­a­cle AI” that just gives us in­for­ma­tion. Here, we dis­cuss the difficul­ties of a few com­mon types of pro­posed Or­a­cle AI.

Two broad cat­e­gories can be treated sep­a­rately: True Or­a­cle AIs, which are true goal-seek­ing AIs with orac­u­lar goals, and Orac­u­lar non-AIs, which are de­signed to be “very smart calcu­la­tors” in­stead of goal-ori­ented agents.

True Or­a­cle AIs

A True Or­a­cle AI is an AI with some kind of orac­u­lar goal. In­for­mally pro­posed orac­u­lar goals of­ten in­clude ideas such as “an­swer all ques­tions”, “only act to provide an­swers to ques­tions”, “have no other effect on the out­side world”, and “in­ter­pret ques­tions as we would wish them to be in­ter­preted.” Orac­u­lar goals are meant to “mo­ti­vate” the AI to provide us with the in­for­ma­tion we want or need, and to keep the AI from do­ing any­thing else.

First, we point out that True Or­a­cle AI is not causally iso­lated from the rest of the world. Like any AI, it has at least its ob­ser­va­tions (ques­tions and data) and its ac­tions (an­swers and other in­for­ma­tion) with which to af­fect the world. A True Or­a­cle AI in­ter­acts through a some­what low-band­width chan­nel, but it is not qual­i­ta­tively differ­ent from any other AI. It still acts au­tonomously in ser­vice of its goal as it an­swers ques­tions, and it is re­al­is­tic to as­sume that a su­per­in­tel­li­gent True Or­a­cle AI will still be able to have large effects on the world.

Given that a True Or­a­cle AI acts, by an­swer­ing ques­tions, to achieve its goal, it fol­lows that True Or­a­cle AI is only safe if its goal is fully com­pat­i­ble with hu­man val­ues. A limited in­ter­ac­tion chan­nel is not a good defense against a su­per­in­tel­li­gence.

There are many ways that omis­sion of de­tail about hu­man value could cause a “ques­tion-an­swer­ing” goal to as­sign util­ity to a very un­de­sir­able state of the world, re­sult­ing in a un­de­sir­able fu­ture. A de­signer of an orac­u­lar goal must be cer­tain to in­clude a vir­tu­ally end­less list of qual­ifiers and patches. An in­com­plete list in­cludes “don’t force­fully ac­quire re­sources to com­pute an­swers, don’t defend your­self against shut­down, don’t co­erce or threaten hu­mans, don’t ma­nipu­late hu­mans to want to help you com­pute an­swers, don’t trick the ques­tioner into ask­ing easy ques­tions, don’t hyp­no­tize the ques­tioner into re­port­ing satis­fac­tion, don’t dra­mat­i­cally sim­plify the world to make pre­dic­tion eas­ier, don’t ask your­self ques­tions, don’t cre­ate a ques­tioner-sur­ro­gate that asks easy ques­tions,” etc.

Since an orac­u­lar goal must con­tain a full speci­fi­ca­tion of hu­man val­ues, the True Or­a­cle AI prob­lem is Friendly-AI-com­plete (FAI-com­plete). If we had the knowl­edge and skills needed to cre­ate a safe True Orac­u­lar AI, we could cre­ate a Friendly AI in­stead.

Orac­u­lar non-AIs

An Orac­u­lar non-AI is a ques­tion-an­swer­ing or oth­er­wise in­for­ma­tive sys­tem that is not goal-seek­ing and has no in­ter­nal parts that are goal-seek­ing, i.e. not an AI at all. In­for­mally, an Orac­u­lar non-AI is some­thing like a “nearly AI-com­plete calcu­la­tor” that im­ple­ments a func­tion from in­put “ques­tions” to out­put “an­swers.” It is difficult to dis­cuss the set of Orac­u­lar non-AIs for­mally be­cause it is a het­ero­ge­neous con­cept by na­ture. De­spite this, we ar­gue that many are ei­ther FAI-com­plete or un­safe for use.

In ad­di­tion to the prob­lems with spe­cific pro­pos­als be­low, many Orac­u­lar non-AI pro­pos­als are based on pow­er­ful meta­com­pu­ta­tion, e.g. Solomonoff in­duc­tion or pro­gram evolu­tion, and there­fore in­cur the generic meta­com­pu­ta­tional haz­ards: they may ac­ci­den­tally perform morally bad com­pu­ta­tions (e.g. suffer­ing sen­tient pro­grams or hu­man simu­la­tions), they may stum­ble upon and fail to sand­box an Un­friendly AI, or they may fall vic­tim to am­bi­ent con­trol by a su­per­in­tel­li­gence. Other un­known meta­com­pu­ta­tional haz­ards may also ex­ist.

Since many Orac­u­lar non-AIs have never been speci­fied for­mally, we ap­proach pro­pos­als on an in­for­mal level.

Orac­u­lar non-AIs: Advisors

An Ad­vi­sor is a sys­tems that takes a cor­pus of real-world data and some­how com­putes the an­swer to the in­for­mal ques­tion “what ought we (or I) to do?”. Ad­vi­sors are FAI-com­plete be­cause:

  • For­mal­iz­ing the ought-ques­tion re­quires a com­plete for­mal state­ment of hu­man val­ues or a for­mal method for find­ing them.

  • An­swer­ing the ought-ques­tion re­quires a full the­ory of in­stru­men­tal de­ci­sion-mak­ing.

Orac­u­lar non-AIs: Ques­tion-Answerers

A Ques­tion-An­swerer is a sys­tem that takes a cor­pus of real-world data along with a “ques­tion,” then some­how com­putes the “an­swer to the ques­tion.” To an­a­lyze the difficulty of cre­at­ing a Ques­tion-An­swerer, sup­pose that we ask it the ques­tion “what ought we (or I) to do?”

  • If it can an­swer this ques­tion, the Ques­tion-An­swerer and the ques­tion to­gether are FAI-com­plete. Either the Ques­tion-An­swerer can un­der­stand the ques­tion as-is, or we can rewrite it in a more for­mal lan­guage; re­gard­less, the Ques­tion-An­swerer and the ques­tion to­gether com­prise an Ad­vi­sor, which we pre­vi­ously ar­gued to be FAI-com­plete.

  • If it can­not an­swer this ques­tion, many of its an­swers are rad­i­cally un­safe. Courses of ac­tion recom­mended by the Ques­tion-An­swerer will likely be un­safe, in­so­far as “safety” re­lies on the defi­ni­tion of hu­man value. Also, ask­ing ques­tions about the fu­ture will turn the Ques­tion-An­swerer into a Pre­dic­tor, lead­ing to the prob­lems out­lined be­low.

Of course, if safe uses for a Ques­tion-An­swerer can be de­vised, we still have the non-neg­ligible challenge of cre­at­ing a Ques­tion-An­swerer with­out us­ing any goal-seek­ing AI tech­niques.

Orac­u­lar non-AIs: Predictors

A Pre­dic­tor is a sys­tem that takes a cor­pus of data and pro­duces a prob­a­bil­ity dis­tri­bu­tion over fu­ture data. Very ac­cu­rate and gen­eral Pre­dic­tors may be based on Solomonoff’s the­ory of uni­ver­sal in­duc­tion.

Very pow­er­ful Pre­dic­tors are un­safe in a rather sur­pris­ing way: when given suffi­cient data about the real world, they ex­hibit goal-seek­ing be­hav­ior, i.e. they calcu­late a dis­tri­bu­tion over fu­ture data in a way that brings about cer­tain real-world states. This is sur­pris­ing, since a Pre­dic­tor is the­o­ret­i­cally just a very large and ex­pen­sive ap­pli­ca­tion of Bayes’ law, not even perform­ing a search over its pos­si­ble out­puts.

To see why, con­sider a Pre­dic­tor P with a large cor­pus of real-world data. If P is suffi­ciently pow­er­ful and the cor­pus is suffi­ciently large, P will in­fer a dis­tri­bu­tion that gives very high prob­a­bil­ity to a model of the world (let’s call it M) that con­tains a model of P be­ing asked the ques­tions we’re ask­ing it. (It is perfectly pos­si­ble for a pro­gram to model its own be­hav­ior, and in fact nec­es­sary if the Pre­dic­tor is to be ac­cu­rate.)

Sup­pose now that we ask P to calcu­late the prob­a­bil­ity of fu­ture data d; call this prob­a­bil­ity P(d). Since model M has much of P’s dis­tri­bu­tion’s prob­a­bil­ity mass, P(d) is ap­prox­i­mately equal to the prob­a­bil­ity of M if M com­putes d (call this M→d), and zero oth­er­wise. Fur­ther­more, since M con­tains a model of the Pre­dic­tor be­ing asked about d, M→d de­pends on the way P’s “an­swer” af­fects M’s ex­e­cu­tion. This means that P(d) de­pends on P(d)’s pre­dicted im­pact on the world; in other words, P takes into ac­count the effects of its pre­dic­tions on the world, and “se­lects” pre­dic­tions that make them­selves ac­cu­rate—P has an im­plicit goal that the world ought to match its pre­dic­tions. This goal does not nec­es­sar­ily al­ign with hu­man goals, and should be treated very care­fully.

Prob­a­bil­is­tic pre­dic­tions of fu­ture data are a very small out­put chan­nel, but once again, the abil­ity of a su­per­in­tel­li­gence to use a small chan­nel effec­tively should not be un­der­es­ti­mated. Ad­di­tion­ally, the difficulty of us­ing such a Pre­dic­tor well (spec­i­fy­ing fu­ture data strings of in­ter­est and in­ter­pret­ing the re­sults) speaks against our abil­ity to keep the Pre­dic­tor from in­fluenc­ing us through its pre­dic­tions.

It is not clear that there is any gen­eral way to de­sign a Pre­dic­tor that will not ex­hibit goal-seek­ing be­hav­ior, short of dra­mat­i­cally limit­ing the power of the Pre­dic­tor.