Some alternatives to “Friendly AI”

Cross-posted from my blog.

What does MIRI’s re­search pro­gram study?

The most es­tab­lished term for this was coined by MIRI founder Eliezer Yud­kowsky: “Friendly AI.” The term has some ad­van­tages, but it might sug­gest that MIRI is try­ing to build C-3PO, and it sounds a bit whim­si­cal for a se­ri­ous re­search pro­gram.

What about safe AGI or AGI safety? Th­ese terms are prob­a­bly eas­ier to in­ter­pret than Friendly AI. Also, peo­ple like be­ing safe, and gov­ern­ments like say­ing they’re fund­ing ini­ti­a­tives to keep the pub­lic safe.

A friend of mine wor­ries that these terms could pro­voke a defen­sive re­sponse (in AI re­searchers) of “Oh, so you think me and ev­ery­body else in AI is work­ing on un­safe AI?” But I’ve never ac­tu­ally heard that re­sponse to “AGI safety” in the wild, and AI safety re­searchers reg­u­larly dis­cuss “soft­ware sys­tem safety” and “AI safety” and “agent safety” and more spe­cific top­ics like “safe re­in­force­ment learn­ing” with­out pro­vok­ing nega­tive re­ac­tions from peo­ple do­ing reg­u­lar AI re­search.

I’m more wor­ried that a term like “safe AGI” could pro­voke a re­sponse of “So you’re try­ing to make sure that a sys­tem which is smarter than hu­mans, and able to op­er­ate in ar­bi­trary real-world en­vi­ron­ments, and able to in­vent new tech­nolo­gies to achieve its goals, will be safe? Let me save you some time and tell you right now that’s im­pos­si­ble. Your re­search pro­gram is a pipe dream.”

My re­ply goes some­thing like “Yeah, it’s way be­yond our cur­rent ca­pa­bil­ities, but lots of things that once looked im­pos­si­ble are now fea­si­ble be­cause peo­ple worked re­ally hard on them for a long time, and we don’t think we can get the whole world to promise never to build AGI just be­cause it’s hard to make safe, so we’re go­ing to give AGI safety a solid try for a few decades and see what can be dis­cov­ered.” But that’s prob­a­bly not all that re­as­sur­ing.

How about high-as­surance AGI? In com­puter sci­ence, a “high as­surance sys­tem” is one built from the ground up for un­usu­ally strong safety and/​or se­cu­rity guaran­tees, be­cause it’s go­ing to be used in safety-crit­i­cal ap­pli­ca­tions where hu­man lives — or some­times sim­ply billions of dol­lars — are at stake (e.g. au­topi­lot soft­ware or Mars rover soft­ware). So there’s a nice anal­ogy to MIRI’s work, where we’re try­ing to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guaran­tees pos­si­ble for such an au­tonomous and ca­pa­ble sys­tem.

I think the main prob­lem with this term is that, quite rea­son­ably, no­body will be­lieve that we can ever get any­where near as much as­surance in the be­hav­ior of an AGI as we can in the be­hav­ior of, say, the rel­a­tively limited AI soft­ware that con­trols the Euro­pean Train Con­trol Sys­tem. “High as­surance AGI” sounds a bit like “To­tally safe all-pow­er­ful de­mon lord.” It sounds even more wildly uni­mag­in­able to AI re­searchers than “safe AGI.”

What about su­per­in­tel­li­gence con­trol or AGI con­trol, as in Bostrom (2014)? “AGI con­trol” is per­haps more be­liev­able than “high-as­surance AGI” or “safe AGI,” since it brings to mind AI con­tain­ment meth­ods, which sound more fea­si­ble to most peo­ple than de­sign­ing an un­con­strained AGI that is some­how nev­er­the­less safe. (It’s okay if they learn later that con­tain­ment prob­a­bly isn’t an ul­ti­mate solu­tion to the prob­lem.)

On the other hand, it might pro­voke a re­ac­tion of “What, you don’t think sen­tient robots have any rights, and you’re free to con­trol and con­fine them in any way you please? You’re just re­peat­ing the im­moral mis­takes of the old slave­mas­ters!” Which of course isn’t true, but it takes some time to ex­plain how I can think it’s ob­vi­ous that con­scious ma­chines have moral value while also be­ing in fa­vor of AGI con­trol meth­ods.

How about eth­i­cal AGI? First, I worry that it sounds too philo­soph­i­cal, and philos­o­phy is widely per­ceived as a con­fused, un­pro­duc­tive dis­ci­pline. Se­cond, I worry that it sounds like the re­search as­sumes moral re­al­ism, which many (most?) in­tel­li­gent peo­ple re­ject. Third, it makes it sound like most of the work is in se­lect­ing the goal func­tion, which I don’t think is true.

What about benefi­cial AGI? That’s bet­ter than “eth­i­cal AGI,” I think, but like “eth­i­cal AGI” and “Friendly AI,” the term sounds less like a se­ri­ous math and en­g­ineer­ing dis­ci­pline and more like some en­clave of crank re­searchers writ­ing a flurry of words (but no math) about how AGI needs to be “nice” and “trust­wor­thy” and “not harm­ful” and oh yeah it must be “vir­tu­ous” too, what­ever that means.

So yeah, I dunno. I think “AGI safety” is my least-dis­liked term these days, but I wish I knew of some bet­ter op­tions.