Muehlhauser-Wang Dialogue

Part of the Muehlhauser in­ter­view se­ries on AGI.

Luke Muehlhauser is Ex­ec­u­tive Direc­tor of the Sin­gu­lar­ity In­sti­tute, a non-profit re­search in­sti­tute study­ing AGI safety.

Pei Wang is an AGI re­searcher at Tem­ple Univer­sity, and Chief Ex­ec­u­tive Edi­tor of Jour­nal of Ar­tifi­cial Gen­eral In­tel­li­gence.

Luke Muehlhauser

[Apr. 7, 2012]

Pei, I’m glad you agreed to dis­cuss ar­tifi­cial gen­eral in­tel­li­gence (AGI) with me. I hope our di­alogue will be in­for­ma­tive to many read­ers, and to us!

On what do we agree? Ben Go­ertzel and I agreed on the state­ments be­low (well, I cleaned up the word­ing a bit for our con­ver­sa­tion):

  1. In­vol­un­tary death is bad, and can be avoided with the right tech­nol­ogy.

  2. Hu­mans can be en­hanced by merg­ing with tech­nol­ogy.

  3. Hu­mans are on a risky course in gen­eral, be­cause pow­er­ful tech­nolo­gies can de­stroy us, hu­mans are of­ten stupid, and we are un­likely to vol­un­tar­ily halt tech­nolog­i­cal progress.

  4. AGI is likely this cen­tury.

  5. AGI will greatly trans­form the world. It is a po­ten­tial ex­is­ten­tial risk, but could also be the best thing that ever hap­pens to us if we do it right.

  6. Care­ful effort will be re­quired to en­sure that AGI re­sults in good things rather than bad things for hu­man­ity.

You stated in pri­vate com­mu­ni­ca­tion that you agree with these state­ments, de­pend­ing on what is meant by “AGI.” So, I’ll ask: What do you mean by “AGI”?

I’d also be cu­ri­ous to learn what you think about AGI safety. If you agree that AGI is an ex­is­ten­tial risk that will ar­rive this cen­tury, and if you value hu­man­ity, one might ex­pect you to think it’s very im­por­tant that we ac­cel­er­ate AI safety re­search and de­cel­er­ate AI ca­pa­bil­ities re­search so that we de­velop safe su­per­hu­man AGI first, rather than ar­bi­trary su­per­hu­man AGI. (This is what Anna Sala­mon and I recom­mend in In­tel­li­gence Ex­plo­sion: Ev­i­dence and Im­port.) What are your thoughts on the mat­ter?

Pei Wang:

[Apr. 8, 2012]

By “AGI” I mean com­puter sys­tems that fol­low roughly the same prin­ci­ples as the hu­man mind. Con­cretely, to me “in­tel­li­gence” is the abil­ity to adapt to the en­vi­ron­ment un­der in­suffi­cient knowl­edge and re­sources, or to fol­low the “Laws of Thought” that re­al­ize a rel­a­tive ra­tio­nal­ity that al­lows the sys­tem to ap­ply its available knowl­edge and re­sources as much as pos­si­ble. See [1, 2] for de­tailed de­scrip­tions and com­par­i­sons to other defi­ni­tions of in­tel­li­gence.

Such a com­puter sys­tem will share many prop­er­ties with the hu­man mind; how­ever, it will not have ex­actly the same be­hav­iors or prob­lem-solv­ing ca­pa­bil­ities of a typ­i­cal hu­man be­ing, since as an adap­tive sys­tem, the be­hav­iors and ca­pa­bil­ities of an AGI not only de­pend on its built-in prin­ci­ples and mechanisms, but also its body, ini­tial mo­ti­va­tion, and in­di­vi­d­ual ex­pe­rience, which are not nec­es­sar­ily hu­man-like.

Like all ma­jor break­throughs in sci­ence and tech­nol­ogy, the cre­ation of AGI will be both a challenge and an op­por­tu­nity to the hu­man kind. Like sci­en­tists and en­g­ineers in all fields, we AGI re­searchers should use our best judg­ments to en­sure that AGI re­sults in good things rather than bad things for hu­man­ity.

Even so, the sug­ges­tion to “ac­cel­er­ate AI safety re­search and de­cel­er­ate AI ca­pa­bil­ities re­search so that we de­velop safe su­per­hu­man AGI first, rather than ar­bi­trary su­per­hu­man AGI” is wrong, for the fol­low­ing ma­jor rea­sons:

  1. It is based on a highly spec­u­la­tive un­der­stand­ing about what kind of “AGI” will be cre­ated. The defi­ni­tion of in­tel­li­gence in In­tel­li­gence Ex­plo­sion: Ev­i­dence and Im­port is not shared by most AGI re­searchers. Ac­cord­ing to my opinion, that kind of “AGI” will never be built.

  2. Even if the above defi­ni­tion is only con­sid­ered as a pos­si­bil­ity among the other ver­sions of AGI, it will be the ac­tual AI re­search that will tell us which pos­si­bil­ity will be­come re­al­ity. To ban a sci­en­tific re­search ac­cord­ing to imag­i­nary risks dam­ages hu­man­ity no less than risky re­search.

  3. If in­tel­li­gence turns out to be adap­tive (as be­lieved by me and many oth­ers), then a “friendly AI” will be mainly the re­sult of proper ed­u­ca­tion, not proper de­sign. There will be no way to de­sign a “safe AI”, just like there is no way to re­quire par­ents to only give birth to “safe baby” who will never be­come a crim­i­nal.

  4. The “friendly AI” ap­proach ad­vo­cated by Eliezer Yud­kowsky has sev­eral se­ri­ous con­cep­tual and the­o­ret­i­cal prob­lems, and is not ac­cepted by most AGI re­searchers. The AGI com­mu­nity has ig­nored it, not be­cause it is in­dis­putable, but be­cause peo­ple have not both­ered to crit­i­cize it.

In sum­mary, though the safety of AGI is in­deed an im­por­tant is­sue, cur­rently we don’t know enough about the sub­ject to make any sure con­clu­sion. Higher safety can only be achieved by more re­search on all re­lated top­ics, rather than by pur­su­ing ap­proaches that have no solid sci­en­tific foun­da­tion. I hope your In­sti­tute to make con­struc­tive con­tri­bu­tion to the field by study­ing a wider range of AGI pro­jects, rather than to gen­er­al­ize from a few, or to com­mit to a con­clu­sion with­out con­sid­er­ing counter ar­gu­ments.

Luke:

[Apr. 8, 2012]

I ap­pre­ci­ate the clar­ity of your writ­ing, Pei. “The As­sump­tions of Knowl­edge and Re­sources in Models of Ra­tion­al­ity” be­longs to a set of pa­pers that make up half of my ar­gu­ment for why the only peo­ple al­lowed to do philos­o­phy should be those with with pri­mary train­ing in cog­ni­tive sci­ence, com­puter sci­ence, or math­e­mat­ics. (The other half of that ar­gu­ment is made by ex­am­in­ing most of the philos­o­phy pa­pers writ­ten by those with­out pri­mary train­ing in cog­ni­tive sci­ence, com­puter sci­ence, or math­e­mat­ics.)

You write that my recom­men­da­tion to “ac­cel­er­ate AI safety re­search and de­cel­er­ate AI ca­pa­bil­ities re­search so that we de­velop safe su­per­hu­man AGI first, rather than ar­bi­trary su­per­hu­man AGI” is wrong for four rea­sons, which I will re­spond to in turn:

  1. “It is based on a highly spec­u­la­tive un­der­stand­ing about what kind of ‘AGI’ will be cre­ated.” Ac­tu­ally, it seems to me that my no­tion of AGI is broader than yours. I think we can use your preferred defi­ni­tion and get the same re­sult. (More on this be­low.)

  2. “…it will be the ac­tual AI re­search that will tell us which pos­si­bil­ity will be­come re­al­ity. To ban a sci­en­tific re­search ac­cord­ing to imag­i­nary risks dam­ages hu­man­ity no less than risky re­search.” Yes, of course. But we ar­gue (very briefly) that a very broad range of ar­tifi­cial agents with a roughly hu­man-level ca­pac­ity for adap­ta­tion (un­der AIKR) will man­i­fest con­ver­gent in­stru­men­tal goals. The ful­ler ar­gu­ment for this is made in Nick’s Bostrom’s “The Su­per­in­tel­li­gent Will.”

  3. “…a ‘friendly AI’ will be mainly the re­sult of proper ed­u­ca­tion, not proper de­sign. There will be no way to de­sign a ‘safe AI’, just like there is no way to re­quire par­ents to only give birth to ‘safe baby’ who will never be­come a crim­i­nal.” Without be­ing more spe­cific, I can’t tell if we ac­tu­ally dis­agree on this point. The most promis­ing ap­proach (that I know of) for Friendly AI is one that learns hu­man val­ues and then “ex­trap­o­lates” them so that the AI op­ti­mizes for what we would value if we knew more, were more the peo­ple we wish we were, etc. in­stead of op­ti­miz­ing for our pre­sent, rel­a­tively ig­no­rant val­ues. (See “The Sin­gu­lar­ity and Ma­chine Ethics.”)

  4. “The ‘friendly AI’ ap­proach ad­vo­cated by Eliezer Yud­kowsky has sev­eral se­ri­ous con­cep­tual and the­o­ret­i­cal prob­lems.”

I agree. Friendly AI may be in­co­her­ent and im­pos­si­ble. In fact, it looks im­pos­si­ble right now. But that’s of­ten how prob­lems look right be­fore we make a few key in­sights that make things clearer, and show us (e.g.) how we were ask­ing a wrong ques­tion in the first place. The rea­son I ad­vo­cate Friendly AI re­search (among other things) is be­cause it may be the only way to se­cure a de­sir­able fu­ture for hu­man­ity, (see “Com­plex Value Sys­tems are Re­quired to Real­ize Valuable Fu­tures.”) even if it looks im­pos­si­ble. That is why Yud­kowsky once pro­claimed: “Shut Up and Do the Im­pos­si­ble!” When we don’t know how to make progress on a difficult prob­lem, some­times we need to hack away at the edges.

I cer­tainly agree that “cur­rently we don’t know enough about [AGI safety] to make any sure con­clu­sion.” That is why more re­search is needed.

As for your sug­ges­tion that “Higher safety can only be achieved by more re­search on all re­lated top­ics,” I won­der if you think that is true of all sub­jects, or only in AGI. For ex­am­ple, should mankind vi­gor­ously pur­sue re­search on how to make Ron Fouch­ier’s al­ter­a­tion of the H5N1 bird flu virus even more dan­ger­ous and deadly to hu­mans, be­cause “higher safety can only be achieved by more re­search on all re­lated top­ics”? (I’m not try­ing to broadly com­pare AGI ca­pa­bil­ities re­search to su­per­virus re­search; I’m just try­ing to un­der­stand the na­ture of your re­jec­tion of my recom­men­da­tion for mankind to de­cel­er­ate AGI ca­pa­bil­ities re­search and ac­cel­er­ate AGI safety re­search.)

Hope­fully I have clar­ified my own po­si­tions and my rea­sons for them. I look for­ward to your re­ply!

Pei:

[Apr. 10, 2012]

Luke: I’m glad to see the agree­ments, and will only com­ment on the dis­agree­ments.

  1. “my no­tion of AGI is broader than yours” In sci­en­tific the­o­ries, broader no­tions are not always bet­ter. In this con­text, a broad no­tion may cover too many di­verse ap­proaches to provide any non-triv­ial con­clu­sion. For ex­am­ple, AIXI and NARS are fun­da­men­tally differ­ent in many as­pects, and NARS do not ap­prox­i­mate AIXI. It is OK to call both “AGI” with re­spect to their similar am­bi­tions, but the­o­ret­i­cal or tech­ni­cal de­scrip­tions based on such a broad no­tion are hard to make. Al­most all of your de­scrip­tions about AIXI are hardly rele­vant to NARS, as well as to most ex­ist­ing “AGI” pro­jects, for this rea­son.

  2. “I think we can use your preferred defi­ni­tion and get the same re­sult.” No you can­not. Ac­cord­ing to my defi­ni­tion, AIXI is not in­tel­li­gent, since it doesn’t obey AIKR. Since most of your con­clu­sions are about that type of sys­tem, they will go with it.

  3. “a very broad range of ar­tifi­cial agents with a roughly hu­man-level ca­pac­ity for adap­ta­tion (un­der AIKR) will man­i­fest con­ver­gent in­stru­men­tal goals” I can­not ac­cess Bostrom’s pa­per, but guess that he made ad­di­tional as­sump­tions. In gen­eral, the goal struc­ture of an adap­tive sys­tem changes ac­cord­ing to the sys­tem’s ex­pe­rience, so un­less you re­strict the ex­pe­rience of these ar­tifi­cial agents, there is no way to re­strict their goals. I agree that to make AGI safe, to con­trol their ex­pe­rience will prob­a­bly be the main ap­proach (which is what “ed­u­ca­tion” is all about), but even that can­not guaran­tee safety. (see be­low)

  4. The Sin­gu­lar­ity and Ma­chine Ethics.” I don’t have the time to do a de­tailed re­view, but can frankly tell you why I dis­agree with the main sug­ges­tion “to pro­gram the AI’s goal sys­tem to want what we want be­fore the AI self-im­proves be­yond our ca­pac­ity to con­trol it”.

  5. As I men­tioned above, the goal sys­tem of an adap­tive sys­tem evolves as a func­tion of the sys­tem’s ex­pe­rience. No mat­ter what ini­tial goals are im­planted, un­der AIKR the de­rived goals are not nec­es­sar­ily their log­i­cal im­pli­ca­tions, which is not nec­es­sar­ily a bad thing (the hu­man­ity is not a log­i­cal im­pli­ca­tion of the hu­man biolog­i­cal na­ture, nei­ther), though it means the de­signer has no full con­trol to it (un­less the de­signer also fully con­trols the ex­pe­rience of the sys­tem, which is prac­ti­cally im­pos­si­ble). See “The self-or­ga­ni­za­tion of goals” for de­tailed dis­cus­sion.

  6. Even if the sys­tem’s goal sys­tem can be made to fully agree with cer­tain given speci­fi­ca­tions, I won­der where these speci­fi­ca­tions come from—we hu­man be­ings are not well known for reach­ing con­sen­sus on al­most any­thing, not to men­tion on a topic this big.

  7. Even if the we could agree on the goals of AI’s, and find a way to en­force them in AI’s, that still doesn’t means we have “friendly AI”. Un­der AIKR, a sys­tem can cause dam­age sim­ply be­cause of its ig­no­rance in a novel situ­a­tion.

For these rea­sons, un­der AIKR we can­not have AI with guaran­teed safety or friendli­ness, though we can and should always do our best to make them safer, based on our best judg­ment (which can still be wrong, due to AIKR). To ap­ply logic or prob­a­bil­ity the­ory into the de­sign won’t change the big pic­ture, be­cause what we are af­ter are em­piri­cal con­clu­sions, not the­o­rems within those the­o­ries. Only the lat­ter can have proved cor­rect­ness, and the former can­not (though they can have strong ev­i­den­tial sup­port).

“I’m just try­ing to un­der­stand the na­ture of your re­jec­tion of my recom­men­da­tion for mankind to de­cel­er­ate AGI ca­pa­bil­ities re­search and ac­cel­er­ate AGI safety re­search”

Frankly, I don’t think any­one cur­rently has the ev­i­dence or ar­gu­ment to ask the oth­ers to de­cel­er­ate their re­search for safety con­sid­er­a­tion, though it is perfectly fine to pro­mote your own re­search di­rec­tion and try to at­tract more peo­ple into it. How­ever, un­less you get a right idea about what AGI is and how it can be built, it is very un­likely for you to know how to make it safe.

Luke:

[Apr. 10, 2012]

I didn’t mean to im­ply that my no­tion of AGI was “bet­ter” be­cause it is broader. I was merely re­spond­ing to your claim that my ar­gu­ment for differ­en­tial tech­nolog­i­cal de­vel­op­ment (in this case, de­cel­er­at­ing AI ca­pa­bil­ities re­search while ac­cel­er­at­ing AI safety re­search) de­pends on a nar­row no­tion of AGI that you be­lieve “will never be built.” But this isn’t true, be­cause my no­tion of AGI is very broad and in­cludes your no­tion of AGI as a spe­cial case. My no­tion of AGI in­cludes both AIXI-like “in­tel­li­gent” sys­tems and also “in­tel­li­gent” sys­tems which obey AIKR, be­cause both kinds of sys­tems (if im­ple­mented/​ap­prox­i­mated suc­cess­fully) could effi­ciently use re­sources to achieve goals, and that is the defi­ni­tion Anna and I stipu­lated for “in­tel­li­gence.”

Let me back up. In our pa­per, Anna and I stipu­late that for the pur­poses of our pa­per we use “in­tel­li­gence” to mean an agent’s ca­pac­ity to effi­ciently use re­sources (such as money or com­put­ing power) to op­ti­mize the world ac­cord­ing to its prefer­ences. You could call this “in­stru­men­tal ra­tio­nal­ity” or “abil­ity to achieve one’s goals” or some­thing else if you pre­fer; I don’t wish to en­courage a “merely ver­bal” dis­pute be­tween us. We also spec­ify that by “AI” (in our dis­cus­sion, “AGI”) we mean “sys­tems which match or ex­ceed the in­tel­li­gence [as we just defined it] of hu­mans in vir­tu­ally all do­mains of in­ter­est.” That is: by “AGI” we mean “sys­tems which match or ex­ceed the hu­man ca­pac­ity for effi­ciently us­ing re­sources to achieve goals in vir­tu­ally all do­mains of in­ter­est.” So I’m not sure I un­der­stood you cor­rectly: Did you re­ally mean to say that “kind of AGI will never be built”? If so, why do you think that? Is the hu­man very close to a nat­u­ral ceiling on an agent’s abil­ity to achieve goals?

What we ar­gue in “In­tel­li­gence Ex­plo­sion: Ev­i­dence and Im­port,” then, is that a very broad range of AGIs pose a threat to hu­man­ity, and there­fore we should be sure we have the safety part figured out as much as we can be­fore we figure out how to build AGIs. But this is the op­po­site of what is hap­pen­ing now. Right now, al­most all AGI-di­rected R&D re­sources are be­ing de­voted to AGI ca­pa­bil­ities re­search rather than AGI safety re­search. This is the case even though there is AGI safety re­search that will plau­si­bly be use­ful given al­most any fi­nal AGI ar­chi­tec­ture, for ex­am­ple the prob­lem of ex­tract­ing co­her­ent prefer­ences from hu­mans (so that we can figure out which rules /​ con­straints /​ goals we might want to use to bound an AGI’s be­hav­ior).

I do hope you have the chance to read “The Su­per­in­tel­li­gent Will.” It is linked near the top of nick­bostrom.com and I will send it to you via email.

But per­haps I have been driv­ing the di­rec­tion of our con­ver­sa­tion too much. Don’t hes­i­tate it to steer it to­wards top­ics you would pre­fer to ad­dress!

Pei:

[Apr. 12, 2012]

Hi Luke,

I don’t ex­pect to re­solve all the re­lated is­sues in such a di­alogue. In the fol­low­ing, I’ll re­turn to what I think as the ma­jor is­sues and sum­ma­rize my po­si­tion.

  1. Whether we can build a “safe AGI” by giv­ing it a care­fully de­signed “goal sys­tem” My an­swer is nega­tive. It is my be­lief that an AGI will nec­es­sar­ily be adap­tive, which im­plies that the goals it ac­tively pur­sues con­stantly change as a func­tion of its ex­pe­rience, and are not fully re­stricted by its ini­tial (given) goals. As de­scribed in my eBook (cited pre­vi­ously), the goal deriva­tion is based on the sys­tem’s be­liefs, which may lead to con­flicts in goals. Fur­ther­more, even if the goals are fixed, they can­not fully de­ter­mine the con­se­quences of the sys­tem’s be­hav­iors, which also de­pend on the sys­tem’s available knowl­edge and re­sources, etc. If all those fac­tors are also fixed, then we may get guaran­teed safety, but the sys­tem won’t be in­tel­li­gent—it will be just like to­day’s or­di­nary (un­in­tel­li­gent) com­puter.

  2. Whether we should figure out how to build “safe AGI” be­fore figur­ing out how to build “AGI”. My an­swer is nega­tive, too. As in all adap­tive sys­tems, the be­hav­iors of an in­tel­li­gent sys­tem are de­ter­mined both by its na­ture (de­sign) and nur­ture (ex­pe­rience). The sys­tem’s in­tel­li­gence mainly comes from its de­sign, and is “morally neu­tral”, in the sense that (1) any goals can be im­planted ini­tially, (2) very differ­ent goals can be de­rived from the same ini­tial de­sign and goals, given differ­ent ex­pe­rience. There­fore, to con­trol the moral­ity of an AI mainly means to ed­u­cate it prop­erly (i.e., to con­trol its ex­pe­rience, es­pe­cially in its early years). Of course, the ini­tial goals mat­ters, but it is wrong to as­sume that the ini­tial goals will always be the dom­i­nat­ing goals in de­ci­sion mak­ing pro­cesses. To de­velop a non-triv­ial ed­u­ca­tion the­ory of AGI re­quires a good un­der­stand­ing about how the sys­tem works, so if we don’t know how to build an AGI, there is no chance for us to know how to make it safe. I don’t think a good ed­u­ca­tion the­ory can be “proved” in ad­vance, pure the­o­ret­i­cally. Rather, we’ll learn most of it by in­ter­act­ing with baby AGIs, just like how many of us learn how to ed­u­cate chil­dren.

Such a short po­si­tion state­ment may not con­vince you, but I hope you can con­sider it at least as a pos­si­bil­ity. I guess the fi­nal con­sen­sus can only come from fur­ther re­search.

Luke:

[Apr. 19, 2012]

Pei,

I agree that an AGI will be adap­tive in the sense that its in­stru­men­tal goals will adapt as a func­tion of its ex­pe­rience. But I do think ad­vanced AGIs will have con­ver­gently in­stru­men­tal rea­sons to pre­serve their fi­nal (or “ter­mi­nal”) goals. As Bostrom ex­plains in “The Su­per­in­tel­li­gent Will”:

An agent is more likely to act in the fu­ture to max­i­mize the re­al­iza­tion of its pre­sent fi­nal goals if it still has those goals in the fu­ture. This gives the agent a pre­sent in­stru­men­tal rea­son to pre­vent al­ter­a­tions of its fi­nal goals.

I also agree that even if an AGI’s fi­nal goals are fixed, the AGI’s be­hav­ior will also de­pend on its knowl­edge and re­sources, and there­fore we can’t ex­actly pre­dict its be­hav­ior. But if a sys­tem has lots of knowl­edge and re­sources, and we know its fi­nal goals, then we can pre­dict with some con­fi­dence that what­ever it does next, it will be some­thing aimed at achiev­ing those fi­nal goals. And the more knowl­edge and re­sources it has, the more con­fi­dent we can be that its ac­tions will suc­cess­fully aim at achiev­ing its fi­nal goals. So if a su­per­in­tel­li­gent ma­chine’s only fi­nal goal is to play through Su­per Mario Bros within 30 min­utes, we can be pretty con­fi­dent it will do so. The prob­lem is that we don’t know how to tell a su­per­in­tel­li­gent ma­chine to do things we want, so we’re go­ing to get many un­in­tended con­se­quences for hu­man­ity (as ar­gued in “The Sin­gu­lar­ity and Ma­chine Ethics”).

You also said that you can’t see what safety work there is to be done with­out hav­ing in­tel­li­gent sys­tems (e.g. “baby AGIs”) to work with. I pro­vided a list of open prob­lems in AI safety here, and most of them don’t re­quire that we know how to build an AGI first. For ex­am­ple, one rea­son we can’t tell an AGI to do what hu­mans want is that we don’t know what hu­mans want, and there is work to be done in philos­o­phy and in prefer­ence ac­qui­si­tion in AI in or­der to get clearer about what hu­mans want.

Pei:

[Apr. 20, 2012]

Luke,

I think we have made our differ­ent be­liefs clear, so this di­alogue has achieved its goal. It won’t be an effi­cient us­age of our time to at­tempt to con­vince each other at this mo­ment, and each side can an­a­lyze these be­liefs in proper forms of pub­li­ca­tion at a fu­ture time.

Now we can let the read­ers con­sider these ar­gu­ments and con­clu­sions.