Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

An ac­tual de­bate about in­stru­men­tal con­ver­gence, in a pub­lic space! Ma­jor re­spect to all in­volved, es­pe­cially Yoshua Ben­gio for great fa­cil­i­ta­tion.

For pos­ter­ity (i.e. hav­ing a good his­tor­i­cal archive) and fur­ther dis­cus­sion, I’ve re­pro­duced the con­ver­sa­tion here. I’m happy to make ed­its at the re­quest of any­one in the dis­cus­sion who is quoted be­low. I’ve im­proved for­mat­ting for clar­ity and fixed some ty­pos. For peo­ple who are not re­searchers in this area who wish to com­ment, see the pub­lic ver­sion of this post here. For peo­ple who do work on the rele­vant ar­eas, please sign up in the top right. It will take a day or so to con­firm mem­ber­ship.

Origi­nal Post

Yann LeCun: “don’t fear the Ter­mi­na­tor”, a short opinion piece by Tony Zador and me that was just pub­lished in Scien­tific Amer­i­can.

“We dra­mat­i­cally over­es­ti­mate the threat of an ac­ci­den­tal AI takeover, be­cause we tend to con­flate in­tel­li­gence with the drive to achieve dom­i­nance. [...] But in­tel­li­gence per se does not gen­er­ate the drive for dom­i­na­tion, any more than horns do.”

https://​​blogs.sci­en­tifi­camer­i­can.com/​​ob­ser­va­tions/​​dont-fear-the-ter­mi­na­tor/​​

Com­ment Thread #1

Elliot Olds: Yann, the smart peo­ple who are very wor­ried about AI seek­ing power and en­sur­ing its own sur­vival be­lieve it’s a big risk be­cause power and sur­vival are in­stru­men­tal goals for al­most any ul­ti­mate goal.

If you give a gen­er­ally in­tel­li­gent AI the goal to make as much money in the stock mar­ket as pos­si­ble, it will re­sist be­ing shut down be­cause that would in­terfere with tis goal. It would try to be­come more pow­er­ful be­cause then it could make money more effec­tively. This is the nat­u­ral con­se­quence of giv­ing a smart agent a goal, un­less we do some­thing spe­cial to coun­ter­act this.

You’ve of­ten writ­ten about how we shouldn’t be so wor­ried about AI, but I’ve never seen you ad­dress this point di­rectly.

Stu­art Rus­sell: It is triv­ial to con­struct a toy MDP in which the agent’s only re­ward comes from fetch­ing the coffee. If, in that MDP, there is an­other “hu­man” who has some prob­a­bil­ity, how­ever small, of switch­ing the agent off, and if the agent has available a but­ton that switches off that hu­man, the agent will nec­es­sar­ily press that but­ton as part of the op­ti­mal solu­tion for fetch­ing the coffee. No ha­tred, no de­sire for power, no built-in emo­tions, no built-in sur­vival in­stinct, noth­ing ex­cept the de­sire to fetch the coffee suc­cess­fully. This point can­not be ad­dressed be­cause it’s a sim­ple math­e­mat­i­cal ob­ser­va­tion.

Com­ment Thread #2

Yoshua Ben­gio: Yann, I’d be cu­ri­ous about your re­sponse to Stu­art Rus­sell’s point.

Yann LeCun: You mean, the so-called “in­stru­men­tal con­ver­gence” ar­gu­ment by which “a robot can’t fetch you coffee if it’s dead. Hence it will de­velop self-preser­va­tion as an in­stru­men­tal sub-goal.”

It might even kill you if you get in the way.

1. Once the robot has brought you coffee, its self-preser­va­tion in­stinct dis­ap­pears. You can turn it off.

2. One would have to be un­be­liev­ably stupid to build open-ended ob­jec­tives in a su­per-in­tel­li­gent (and su­per-pow­er­ful) ma­chine with­out some safe­guard terms in the ob­jec­tive.

3. One would have to be rather in­com­pe­tent not to have a mechanism by which new terms in the ob­jec­tive could be added to pre­vent pre­vi­ously-un­fore­seen bad be­hav­ior. For hu­mans, we have ed­u­ca­tion and laws to shape our ob­jec­tive func­tions and com­ple­ment the hard­wired terms built into us by evolu­tion.

4. The power of even the most su­per-in­tel­li­gent ma­chine is limited by physics, and its size and needs make it vuln­er­a­ble to phys­i­cal at­tacks. No need for much in­tel­li­gence here. A virus is in­finitely less in­tel­li­gent than you, but it can still kill you.

5. A sec­ond ma­chine, de­signed solely to neu­tral­ize an evil su­per-in­tel­li­gent ma­chine will win ev­ery time, if given similar amounts of com­put­ing re­sources (be­cause spe­cial­ized ma­chines always beat gen­eral ones).

Bot­tom line: there are lots and lots of ways to pro­tect against badly-de­signed in­tel­li­gent ma­chines turned evil.

Stu­art has called me stupid in the Van­ity Fair in­ter­view linked be­low for allegedly not un­der­stand­ing the whole idea of in­stru­men­tal con­ver­gence.

It’s not that I don’t un­der­stand it. I think it would only be rele­vant in a fan­tasy world in which peo­ple would be smart enough to de­sign su­per-in­tel­li­gent ma­chines, yet ridicu­lously stupid to the point of giv­ing it mo­ronic ob­jec­tives with no safe­guards.

Here is the juicy bit from the ar­ti­cle where Stu­art calls me stupid:

Rus­sell took ex­cep­tion to the views of Yann LeCun, who de­vel­oped the fore­run­ner of the con­volu­tional neu­ral nets used by AlphaGo and is Face­book’s di­rec­tor of A.I. re­search. LeCun told the BBC that there would be no Ex Machina or Ter­mi­na­tor sce­nar­ios, be­cause robots would not be built with hu­man drives—hunger, power, re­pro­duc­tion, self-preser­va­tion. “Yann LeCun keeps say­ing that there’s no rea­son why ma­chines would have any self-preser­va­tion in­stinct,” Rus­sell said. “And it’s sim­ply and math­e­mat­i­cally false. I mean, it’s so ob­vi­ous that a ma­chine will have self-preser­va­tion even if you don’t pro­gram it in be­cause if you say, ‘Fetch the coffee,’ it can’t fetch the coffee if it’s dead. So if you give it any goal what­so­ever, it has a rea­son to pre­serve its own ex­is­tence to achieve that goal. And if you threaten it on your way to get­ting coffee, it’s go­ing to kill you be­cause any risk to the coffee has to be coun­tered. Peo­ple have ex­plained this to LeCun in very sim­ple terms.”

https://​​www.van­i­ty­fair.com/​​news/​​2017/​​03/​​elon-musk-billion-dol­lar-cru­sade-to-stop-ai-space-x

Tony Zador: I agree with most of what Yann wrote about Stu­art Rus­sell’s con­cern.

Speci­fi­cally, I think the flaw in Stu­art’s ar­gu­ment is the as­ser­tion that “switch­ing off the hu­man is the op­ti­mal solu­tion”—who says that’s an op­ti­mal solu­tion?

I guess if you posit an om­nipo­tent robot, de­stroy­ing hu­man­ity might be a pos­si­ble solu­tion. But if the robot is not om­nipo­tent, then kil­ling hu­mans comes at con­sid­er­able risk, ie that they will re­tal­i­ate. Or hu­mans might build spe­cial “pro­tec­tor robots” whose value func­tion is solely fo­cused on pre­vent­ing the kil­ling of hu­mans by other robots. Pre­sum­ably these robots would be at least as well armed as the coffee robots. So this re­ally in­creases the risk to the coffee robots of pur­su­ing the geno­cide strat­egy.

And if the robot is om­nipo­tent, then there are an in­finite num­ber of al­ter­na­tive strate­gies to en­sure sur­vival (like putting up an im­pen­e­tra­ble forcefield around the off switch) that work just as well.

So i would say that kil­ling all hu­mans is not only not likely to be an op­ti­mal strat­egy un­der most sce­nar­ios, the set of sce­nar­ios un­der which it is op­ti­mal is prob­a­bly close to a set of mea­sure 0.

Stu­art Rus­sell: Thanks for clear­ing that up—so 2+2 is not equal to 4, be­cause if the 2 were a 3, the an­swer wouldn’t be 4? I sim­ply pointed out that in the MDP as I defined it, switch­ing off the hu­man is the op­ti­mal solu­tion, de­spite the fact that we didn’t put in any emo­tions of power, dom­i­na­tion, hate, testos­terone, etc etc. And your solu­tion seems, well, frankly ter­rify­ing, al­though I sup­pose the NRA would ap­prove. Your last sug­ges­tion, that the robot could pre­vent any­one from ever switch­ing it off, is also one of the things we are try­ing to avoid. The point is that the be­hav­iors we are con­cerned about have noth­ing to do with putting in emo­tions of sur­vival, power, dom­i­na­tion, etc. So ar­gu­ing that there’s no need to put those emo­tions in is com­pletely miss­ing the point.

Yann LeCun: Not clear whether you are refer­ring to my com­ment or Tony’s.

The point is that be­hav­iors you are con­cerned about are eas­ily avoid­able by sim­ple terms in the ob­jec­tive. In the un­likely event that these safe­guards some­how fail, my par­tial list of es­ca­lat­ing solu­tions (which you seem to find ter­rify­ing) is there to pre­vent a catas­tro­phe. So ar­gu­ing that emo­tions of sur­vival etc will in­evitably lead to dan­ger­ous be­hav­ior is com­pletely miss­ing the point.

It’s a bit like say­ing that build­ing cars with­out brakes will lead to fatal­ities.

Yes, but why would we be so stupid as to not in­clude brakes?

That said, in­stru­men­tal sub­goals are much weaker drives of be­hav­ior than hard­wired ob­jec­tives. Else, how could one ex­plain the lack of dom­i­na­tion be­hav­ior in non-so­cial an­i­mals, such as orangutans.

Francesca Rossi: @Yann In­deed it would be odd to de­sign an AI sys­tem with a spe­cific goal, like fetch­ing coffee, and ca­pa­bil­ities that in­clude kil­ling hu­mans or dis­al­low­ing be­ing turned off, with­out equip­ping it also with guidelines and pri­ori­ties to con­strain its free­dom, so it can un­der­stand for ex­am­ple that fetch­ing coffee is not so im­por­tant that it is worth kil­ling a hu­man be­ing to do it. Value al­ign­ment is fun­da­men­tal to achieve this. Why would we build ma­chines that are not al­igned to our val­ues? Stu­art, I agree that it would easy to build a coffee fetch­ing ma­chine that is not al­igned to our val­ues, but why would we do this? Of course value al­ign­ment is not easy, and still a re­search challenge, but I would make it part of the pic­ture when we en­vi­sion fu­ture in­tel­li­gent ma­chines.

Richard Mal­lah: Francesca, of course Stu­art be­lieves we should cre­ate value-al­igned AI. The point is that there are too many caveats to ex­plic­itly add each to an ob­jec­tive func­tion, and there are strong so­cioe­co­nomic drives for hu­mans to mon­e­tize AI prior to get­ting it suffi­ciently right, suffi­ciently safe.

Stu­art Rus­sell: “Why would be build ma­chines that are not al­igned to our val­ues?” That’s what we are do­ing, all the time. The stan­dard model of AI as­sumes that the ob­jec­tive is fixed and known (check the text­book!), and we build ma­chines on that ba­sis—whether it’s click­through max­i­miza­tion in so­cial me­dia con­tent se­lec­tion or to­tal er­ror min­i­miza­tion in photo la­bel­ing (Google Jacky Alciné) or, per Danny Hillis, profit max­i­miza­tion in fos­sil fuel com­pa­nies. This is go­ing to be­come even more un­ten­able as ma­chines be­come more pow­er­ful. There is no hope of “solv­ing the value al­ign­ment prob­lem” in the sense of figur­ing out the right value func­tion offline and putting it into the ma­chine. We need to change the way we do AI.

Yoshua Ben­gio: All right, we’re mak­ing some progress to­wards a healthy de­bate. Let me try to sum­ma­rize my un­der­stand­ing of the ar­gu­ments. Yann LeCun and Tony Zadorr ar­gue that hu­mans would be stupid to put in ex­plicit dom­i­nance in­stincts in our AIs. Stu­art Rus­sell re­sponds that it needs not be ex­plicit but dan­ger­ous or im­moral be­hav­ior may sim­ply arise out of im­perfect value al­ign­ment and in­stru­men­tal sub­goals set by the ma­chine to achieve its offi­cial goals. Yann LeCun and Tony Zador re­spond that we would be stupid not to pro­gram the proper ‘laws of robotics’ to pro­tect hu­mans. Stu­art Rus­sell is con­cerned that value al­ign­ment is not a solved prob­lem and may be in­tractable (i.e. there will always re­main a gap, and a suffi­ciently pow­er­ful AI could ‘ex­ploit’ this gap, just like very pow­er­ful cor­po­ra­tions cur­rently of­ten act legally but im­morally). Yann LeCun and Tony Zador ar­gue that we could also build defen­sive mil­i­tary robots de­signed to only kill reg­u­lar AIs gone rogue by lack of value al­ign­ment. Stu­art Rus­sell did not ex­plic­itly re­spond to this but I in­fer from his NRA refer­ence that we could be worse off with these defen­sive robots be­cause now they have ex­plicit weapons and can also suffer from the value mis­al­ign­ment prob­lem.

Yoshua Ben­gio: So at the end of the day, it boils down to whether we can han­dle the value mis­al­ign­ment prob­lem, and I’m afraid that it’s not clear we can for sure, but it also seems rea­son­able to think we will be able to in the fu­ture. Maybe part of the prob­lem is that Yann LeCun and Tony Zador are satis­fied with a 99.9% prob­a­bil­ity that we can fix the value al­ign­ment prob­lem while Stu­art Rus­sell is not satis­fied with tak­ing such an ex­is­ten­tial risk.

Yoshua Ben­gio: And there is an­other is­sue which was not much dis­cussed (al­though the ar­ti­cle does talk about the short-term risks of mil­i­tary uses of AI etc), and which con­cerns me: hu­mans can eas­ily do stupid things. So even if there are ways to miti­gate the pos­si­bil­ity of rogue AIs due to value mis­al­ign­ment, how can we guaran­tee that no sin­gle hu­man will act stupidly (more likely, greed­ily for their own power) and un­leash dan­ger­ous AIs in the world? And for this, we don’t even need su­per­in­tel­li­gent AIs, to feel very con­cerned. The value al­ign­ment prob­lem also ap­plies to hu­mans (or com­pa­nies) who have a lot of power: the mis­al­ign­ment be­tween their in­ter­ests and the com­mon good can lead to catas­trophic out­comes, as we already know (e.g. tragedy of the com­mons, cor­rup­tion, com­pa­nies ly­ing to have you buy their cigarettes or their oil, etc). It just gets worse when more power can be con­cen­trated in the hands of a sin­gle per­son or or­ga­ni­za­tion, and AI ad­vances can provide that power.

Francesca Rossi: I am more op­ti­mistic than Stu­art about the value al­ign­ment prob­lem. I think that a suit­able com­bi­na­tion of sym­bolic rea­son­ing and var­i­ous forms of ma­chine learn­ing can help us to both ad­vance AI’s ca­pa­bil­ities and get closer to solv­ing the value al­ign­ment prob­lem.

Tony Zador: @Stu­art Rus­sell “Thanks for clear­ing that up—so 2+2 is not equal to 4, be­cause if the 2 were a 3, the an­swer wouldn’t be 4? ”

hmm. not quite what i’m say­ing.

If we’re go­ing for the math analo­gies, then i would say that a bet­ter anal­ogy is:

Find X, Y such that X+Y=4.

The “kil­ler coffee robot” solu­tion is {X=642, Y = −638}. In other words: Yes, it is a solu­tion, but not a par­tic­u­larly nat­u­ral or likely or good solu­tion.

But we hu­mans are blinded but our own warped per­spec­tive. We fo­cus on the solu­tion that in­volves kil­ling other crea­tures be­cause that ap­pears to be one of the main solu­tions that we hu­mans de­fault to. But it is not a par­tic­u­larly com­mon solu­tion in the nat­u­ral world, nor do i think it’s a par­tic­u­larly effec­tive solu­tion in the long run.

Yann LeCun: Hu­man­ity has been very fa­mil­iar with the prob­lem of fix­ing value mis­al­ign­ments for mil­le­nia.

We fix our chil­dren’s hard­wired val­ues by teach­ing them how to be­have.

We fix hu­man value mis­al­ign­ment by laws. Laws cre­ate ex­trin­sic terms in our ob­jec­tive func­tions and cause the ap­pear­ance of in­stru­men­tal sub­goals (“don’t steal”) in or­der to avoid pun­ish­ment. The de­sire for so­cial ac­cep­tance also cre­ates such in­stru­men­tal sub­goals driv­ing good be­hav­ior.

We even fix value mis­al­ign­ment for su­per-hu­man and su­per-in­tel­li­gent en­tities, such as cor­po­ra­tions and gov­ern­ments.

This last one oc­ca­sion­ally fails, which is a con­sid­er­ably more im­me­di­ate ex­is­ten­tial threat than AI.

Tony Zador: @Yoshua Ben­gio I agree with much of your sum­mary. I agree value al­ign­ment is im­por­tant, and that it is not a solved prob­lem.

I also agree that new tech­nolo­gies of­ten have un­in­tended and profound con­se­quences. The in­ven­tion of books has led to a de­cline in our mem­o­ries (peo­ple used to re­cite the en­tire Odyssey). Im­prove­ments in food pro­duc­tion tech­nol­ogy (and other fac­tors) have led to a sur­pris­ing obe­sity epi­demic. The in­ven­tion of so­cial me­dia is dis­rupt­ing our poli­ti­cal sys­tems in ways that, to me any­way, have been quite sur­pris­ing. So im­prove­ments in AI will un­doubt­edly have profound con­se­quences for so­ciety, some of which will be nega­tive.

But in my view, fo­cus­ing on “kil­ler robots that dom­i­nate or step on hu­mans” is a dis­trac­tion from much more se­ri­ous is­sues.

That said, per­haps “kil­ler robots” can be thought of as a metaphor (or metonym) for the set of all scary sce­nar­ios that re­sult from this pow­er­ful new tech­nol­ogy.

Yann LeCun: @Stu­art Rus­sell you write “we need to change the way we do AI”. The prob­lems you de­scribe have noth­ing to do with AI per se.

They have to do with de­sign­ing (not avoid­ing) ex­plicit in­stru­men­tal ob­jec­tives for en­tities (e.g. cor­po­ra­tions) so that their over­all be­hav­ior works for the com­mon good. This is a prob­lem of law, eco­nomics, poli­cies, ethics, and the prob­lem of con­trol­ling com­plex dy­nam­i­cal sys­tems com­posed of many agents in in­ter­ac­tion.

What is re­quired is a mechanism through which ob­jec­tives can be changed quickly when is­sues sur­face. For ex­am­ple, Face­book stopped max­i­miz­ing click­throughs sev­eral years ago and stopped us­ing the time spent in the app as a crite­rion about 2 years ago. It put in place mea­sures to limit the dis­sem­i­na­tion of click­bait, and it fa­vored con­tent shared by friends rather than di­rectly dis­sem­i­nat­ing con­tent from pub­lish­ers.

We cer­tainly agree that de­sign­ing good ob­jec­tives is hard. Hu­man­ity has strug­gled with de­sign­ing ob­jec­tives for it­self for mil­len­nia. So this is not a new prob­lem. If any­thing, de­sign­ing ob­jec­tives for ma­chines, and forc­ing them to abide by them will be a lot eas­ier than for hu­mans, since we can phys­i­cally mod­ify their firmware.

There will be mis­takes, no doubt, as with any new tech­nol­ogy (early jetlin­ers lost wings, early cars didn’t have seat belts, roads didn’t have speed limits...).

But I dis­agree that there is a high risk of ac­ci­den­tally build­ing ex­is­ten­tial threats to hu­man­ity.

Ex­is­ten­tial threats to hu­man­ity have to be ex­plic­itly de­signed as such.

Yann LeCun: It will be much, much eas­ier to con­trol the be­hav­ior of au­tonomous AI sys­tems than it has been for hu­mans and hu­man or­ga­ni­za­tions, be­cause we will be able to di­rectly mod­ify their in­trin­sic ob­jec­tive func­tion.

This is very much un­like hu­mans, whose ob­jec­tive can only be shaped through ex­trin­sic ob­jec­tive func­tions (through ed­u­ca­tion and laws), that in­di­rectly cre­ate in­stru­men­tal sub-ob­jec­tives (“be nice, don’t steal, don’t kill, or you will be pun­ished”).

As I have pointed out in sev­eral talks in the last sev­eral years, au­tonomous AI sys­tems will need to have a train­able part in their ob­jec­tive, which would al­low their han­dlers to train them to be­have prop­erly, with­out hav­ing to di­rectly hack their ob­jec­tive func­tion by pro­gram­matic means.

Yoshua Ben­gio: Yann, these are good points, we in­deed have much more con­trol over ma­chines than hu­mans since we can de­sign (and train) their ob­jec­tive func­tion. I ac­tu­ally have some hopes that by us­ing an ob­jec­tive-based mechanism rely­ing on learn­ing (to in­cul­cate val­ues) rather than a set of hard rules (like in much of our le­gal sys­tem), we could achieve more ro­bust­ness to un­fore­seen value al­ign­ment mishaps. In fact, I sur­mise we should do that with hu­man en­tities too, i.e., pe­nal­ize com­pa­nies, e.g. fis­cally, when they be­have in a way which hurts the com­mon good, even if they are not di­rectly vi­o­lat­ing an ex­plicit law. This also sug­gests to me that we should try to avoid that any en­tity (per­son, com­pany, AI) have too much power, to avoid such prob­lems. On the other hand, al­though prob­a­bly not in the near fu­ture, there could be AI sys­tems which sur­pass hu­man in­tel­lec­tual power in ways that could foil our at­tempts at set­ting ob­jec­tive func­tions which avoid harm to us. It seems hard to me to com­pletely deny that pos­si­bil­ity, which thus would beg for more re­search in (ma­chine-) learn­ing moral val­ues, value al­ign­ment, and maybe even in pub­lic poli­cies about AI (to min­i­mize the events in which a stupid hu­man brings about AI sys­tems with­out the proper failsafes) etc.

Yann LeCun: @Yoshua Ben­gio if we can build “AI sys­tems which sur­pass hu­man in­tel­lec­tual power in ways that could foil our at­tempts at set­ting ob­jec­tive func­tions”, we can also build similarly-pow­er­ful AI sys­tems to set those ob­jec­tive func­tions.

Sort of like the dis­crim­i­na­tor in GANs....

Yann LeCun: @Yoshua Ben­gio a cou­ple di­rect com­ments on your sum­mary:

  • de­sign­ing ob­jec­tives for su­per-hu­man en­tities is not a new prob­lem. Hu­man so­cieties have been do­ing this through laws (con­cern­ing cor­po­ra­tions and gov­ern­ments) for mil­len­nia.

  • the defen­sive AI sys­tems de­signed to pro­tect against rogue AI sys­tems are not akin to the mil­i­tary, they are akin to the po­lice, to law en­force­ment. Their “ju­ris­dic­tion” would be strictly AI sys­tems, not hu­mans.

But un­til we have a hint of a be­gin­ning of a de­sign, with some visi­ble path to­wards au­tonomous AI sys­tems with non-triv­ial in­tel­li­gence, we are ar­gu­ing about the sex of an­gels.

Yuri Bar­zov: Aren’t we over­es­ti­mat­ing the abil­ity of im­perfect hu­mans to build a perfect ma­chine? If it will be much more pow­er­ful than hu­mans its im­perfec­tions will be also mag­nified. Cute hu­man kids grow up into crim­i­nals if they get spoiled by re­in­force­ment i.e. ad­dic­tion to re­wards. We use re­in­force­ment and back­prop­a­ga­tion (kind of re­in­force­ment) in mod­ern golden stan­dard AI sys­tems. Do we know enough about hu­mans to be able to build a fault-proof hu­man friendly su­per in­tel­li­gent ma­chine?

Yoshua Ben­gio: @Yann LeCun, about dis­crim­i­na­tors in GANs, and crit­ics in Ac­tor-Critic RL, one thing we know is that they tend to be bi­ased. That is why the critic in Ac­tor-Critic is not used as an ob­jec­tive func­tion but in­stead as a baseline to re­duce the var­i­ance. Similarly, op­ti­miz­ing the gen­er­a­tor wrt a fixed dis­crim­i­na­tor does not work (you would con­verge to a sin­gle mode—un­less you bal­ance that with en­tropy max­i­miza­tion). Any­ways, just to say, there is much more re­search to do, lots of un­known un­knowns about learn­ing moral ob­jec­tive func­tions for AIs. I’m not afraid of re­search challenges, but I can un­der­stand that some peo­ple would be con­cerned about the safety of grad­u­ally more pow­er­ful AIs with mis­al­igned ob­jec­tives. I ac­tu­ally like the way that Stu­art Rus­sell is at­tack­ing this prob­lem by think­ing about it not just in terms of an ob­jec­tive func­tion but also about un­cer­tainty: the AI should avoid ac­tions which might hurt us (ac­cord­ing to a self-es­ti­mate of the un­cer­tain con­se­quences of ac­tions), and stay the con­ser­va­tive course with high con­fi­dence of ac­com­plish­ing the mis­sion while not cre­at­ing col­lat­eral dam­age. I think that what you and I are try­ing to say is that all this is quite differ­ent from the ter­mi­na­tor sce­nar­ios which some peo­ple in the me­dia are bran­dish­ing. I also agree with you that there are lots of un­known un­knowns about the strengths and weak­nesses of fu­ture AIs, but I think that it is not too early to start think­ing about these is­sues.

Yoshua Ben­gio: @Yuri Bar­zov the an­swer to your ques­tion: no. But we don’t know that it is not fea­si­ble ei­ther, and we have rea­sons to be­lieve that (a) it is not for to­mor­row such ma­chines will ex­ist and (b) we have in­tel­lec­tual tools which may lead to solu­tions. Or maybe not!

Stu­art Rus­sell: Yann’s com­ment “Face­book stopped max­i­miz­ing click­throughs sev­eral years ago and stopped us­ing the time spent in the app as a crite­rion about 2 years ago” makes my point for me. Why did they stop do­ing it? Be­cause it was the wrong ob­jec­tive func­tion. Yann says we’d have to be “ex­tremely stupid” to put the wrong ob­jec­tive into a su­per-pow­er­ful ma­chine. Face­book’s plat­form is not su­per-smart but it is su­per-pow­er­ful, be­cause it con­nects with billions of peo­ple for hours ev­ery day. And yet they put the wrong ob­jec­tive func­tion into it. QED. For­tu­nately they were able to re­set it, but un­for­tu­nately one has to as­sume it’s still op­ti­miz­ing a fixed ob­jec­tive. And the fact that it’s op­er­at­ing within a large cor­po­ra­tion that’s de­signed to max­i­mize an­other fixed ob­jec­tive—profit—means we can­not switch it off.

Stu­art Rus­sell: Re­gard­ing “ex­ter­nal­ities”—when talk­ing about ex­ter­nal­ities, economists are mak­ing es­sen­tially the same point I’m mak­ing: ex­ter­nal­ities are the things not stated in the given ob­jec­tive func­tion that get dam­aged when the sys­tem op­ti­mizes that ob­jec­tive func­tion. In the case of the at­mo­sphere, it’s rel­a­tively easy to mea­sure the amount of pol­lu­tion and charge for it via taxes or fines, so cor­rect­ing the prob­lem is pos­si­ble (un­less the offen­der is too pow­er­ful). In the case of ma­nipu­la­tion of hu­man prefer­ences and in­for­ma­tion states, it’s very hard to as­sess costs and im­pose taxes or fines. The the­ory of un­cer­tain ob­jec­tives sug­gests in­stead that sys­tems be de­signed to be “min­i­mally in­va­sive”, i.e., don’t mess with parts of the world state whose value is un­clear. In par­tic­u­lar, as a gen­eral rule it’s prob­a­bly best to avoid us­ing fixed-ob­jec­tive re­in­force­ment learn­ing in hu­man-fac­ing sys­tems, be­cause the re­in­force­ment learner will learn how to ma­nipu­late the hu­man to max­i­mize its ob­jec­tive.

Stu­art Rus­sell: @Yann LeCun Let’s talk about cli­mate change for a change. Many ar­gue that it’s an ex­is­ten­tial or near-ex­is­ten­tial threat to hu­man­ity. Was it “ex­plic­itly de­signed” as such? We cre­ated the cor­po­ra­tion, which is a fixed-ob­jec­tive max­i­mizer. The pur­pose was not to cre­ate an ex­is­ten­tial risk to hu­man­ity. Fos­sil-fuel cor­po­ra­tions be­came su­per-pow­er­ful and, in cer­tain rele­vant senses, su­per-in­tel­li­gent: they an­ti­ci­pated and be­gan plan­ning for global warm­ing five decades ago, ex­e­cut­ing a cam­paign that out­wit­ted the rest of the hu­man race. They didn’t win the aca­demic ar­gu­ment but they won in the real world, and the hu­man race lost. I just at­tended an NAS meet­ing on cli­mate con­trol sys­tems, where the con­sen­sus was that it was too dan­ger­ous to de­velop, say, so­lar ra­di­a­tion man­age­ment sys­tems—not be­cause they might pro­duce un­ex­pected dis­as­trous effects but be­cause the fos­sil fuel cor­po­ra­tions would use their ex­is­tence as a fur­ther form of lev­er­age in their so-far suc­cess­ful cam­paign to keep burn­ing more car­bon.

Stu­art Rus­sell: @Yann LeCun This seems to be a very weak ar­gu­ment. The ob­jec­tion raised by Omo­hun­dro and oth­ers who dis­cuss in­stru­men­tal goals is aimed at any sys­tem that op­er­ates by op­ti­miz­ing a fixed, known ob­jec­tive; which cov­ers pretty much all pre­sent-day AI sys­tems. So the is­sue is: what hap­pens if we keep to that gen­eral plan—let’s call it the “stan­dard model”—and im­prove the ca­pa­bil­ities for the sys­tem to achieve the ob­jec­tive? We don’t need to know to­day *how* a fu­ture sys­tem achieves ob­jec­tives more suc­cess­fully, to see that it would be prob­le­matic. So the pro­posal is, don’t build sys­tems ac­cord­ing to the stan­dard model.

Yann LeCun: @Stu­art Rus­sell the prob­lem is that es­sen­tially no AI sys­tem to­day is au­tonomous.

They are all trained *in ad­vance* to op­ti­mize an ob­jec­tive, and sub­se­quently ex­e­cute the task with no re­gards to the ob­jec­tive, hence with no way to spon­ta­neously de­vi­ate from the origi­nal be­hav­ior.

As of to­day, as far as I can tell, we do *not* have a good de­sign for an au­tonomous ma­chine, driven by an ob­jec­tive, ca­pa­ble of com­ing up with new strate­gies to op­ti­mize this ob­jec­tive in the real world.

We have plenty of those in games and sim­ple simu­la­tion. But the learn­ing paradigms are way too in­effi­cient to be prac­ti­cal in the real world.

Yuri Bar­zov: @Yoshua Ben­gio yes. If we frame the prob­lem cor­rectly we will be able to re­solve it. AI puts nat­u­ral in­tel­li­gence into fo­cus like a mag­nify­ing mirror

Yann LeCun: @Stu­art Rus­sell in pretty much ev­ery­thing that so­ciety does (busi­ness, gov­ern­ment, of what­ever) be­hav­iors are shaped through in­cen­tives, penalties via con­tracts, reg­u­la­tions and laws (let’s call them col­lec­tively the ob­jec­tive func­tion), which are prox­ies for the met­ric that needs to be op­ti­mized.

Be­cause so­cieties are com­plex sys­tems, be­cause hu­mans are com­plex agents, and be­cause con­di­tions evolve, it is a re­quire­ment that the ob­jec­tive func­tion be mod­ifi­able to cor­rect un­fore­seen nega­tive effects, loop­holes, in­effi­cien­cies, etc.

The Face­book story is un­re­mark­able in that re­spect: when bad side effects emerge, mea­sures are taken to cor­rect them. Often, these mea­sures elimi­nate bad ac­tors by di­rectly chang­ing their eco­nomic in­cen­tive (e.g. re­mov­ing the eco­nomic in­cen­tive for click­baits).

Per­haps we agree on the fol­low­ing:

(0) not all con­se­quences of a fixed set of in­cen­tives can be pre­dicted.

(1) be­cause of that, ob­jec­tives func­tions must be up­dat­able.

(2) they must be up­dated to cor­rect bad effect when­ever they emerge.

(3) there should be an easy way to train minor as­pects of ob­jec­tive func­tions through sim­ple in­ter­ac­tion (similar to the pro­cess of ed­u­cat­ing chil­dren), as op­posed to pro­gram­matic means.

Per­haps where we dis­agree is the risk of in­ad­ver­tently pro­duc­ing sys­tems with badly-de­signed and (some­how) un-mod­ifi­able ob­jec­tives that would be pow­er­ful enough to con­sti­tute ex­is­ten­tial threats.

Yoshua Ben­gio: @Yann LeCun this is true, but one as­pect which con­cerns me (and oth­ers) is the grad­ual in­crease in power of some agents (now mostly large com­pa­nies and some gov­ern­ments, po­ten­tially some AI sys­tems in the fu­ture). When it was just weak hu­mans the cost of mis­takes or value mis­al­ign­ment (im­proper laws, mis­al­igned ob­jec­tive func­tion) was always very limited and lo­cal. As we build more and more pow­er­ful and in­tel­li­gent tools and or­ga­ni­za­tions, (1) it be­comes eas­ier to cheat for ‘smarter’ agents (ex­ploit the mis­al­ign­ment) and (2) the cost of these mis­al­ign­ments be­comes greater, po­ten­tially threat­en­ing the whole of so­ciety. This then does not leave much time and warn­ing to re­act to value mis­al­ign­ment.