Some alternatives to “Friendly AI”

Cross-posted from my blog.

What does MIRI’s research program study?

The most established term for this was coined by MIRI founder Eliezer Yudkowsky: “Friendly AI.” The term has some advantages, but it might suggest that MIRI is trying to build C-3PO, and it sounds a bit whimsical for a serious research program.

What about safe AGI or AGI safety? These terms are probably easier to interpret than Friendly AI. Also, people like being safe, and governments like saying they’re funding initiatives to keep the public safe.

A friend of mine worries that these terms could provoke a defensive response (in AI researchers) of “Oh, so you think me and everybody else in AI is working on unsafe AI?” But I’ve never actually heard that response to “AGI safety” in the wild, and AI safety researchers regularly discuss “software system safety” and “AI safety” and “agent safety” and more specific topics like “safe reinforcement learning” without provoking negative reactions from people doing regular AI research.

I’m more worried that a term like “safe AGI” could provoke a response of “So you’re trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that’s impossible. Your research program is a pipe dream.”

My reply goes something like “Yeah, it’s way beyond our current capabilities, but lots of things that once looked impossible are now feasible because people worked really hard on them for a long time, and we don’t think we can get the whole world to promise never to build AGI just because it’s hard to make safe, so we’re going to give AGI safety a solid try for a few decades and see what can be discovered.” But that’s probably not all that reassuring.

How about high-assurance AGI? In computer science, a “high assurance system” is one built from the ground up for unusually strong safety and/​or security guarantees, because it’s going to be used in safety-critical applications where human lives — or sometimes simply billions of dollars — are at stake (e.g. autopilot software or Mars rover software). So there’s a nice analogy to MIRI’s work, where we’re trying to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guarantees possible for such an autonomous and capable system.

I think the main problem with this term is that, quite reasonably, nobody will believe that we can ever get anywhere near as much assurance in the behavior of an AGI as we can in the behavior of, say, the relatively limited AI software that controls the European Train Control System. “High assurance AGI” sounds a bit like “Totally safe all-powerful demon lord.” It sounds even more wildly unimaginable to AI researchers than “safe AGI.”

What about superintelligence control or AGI control, as in Bostrom (2014)? “AGI control” is perhaps more believable than “high-assurance AGI” or “safe AGI,” since it brings to mind AI containment methods, which sound more feasible to most people than designing an unconstrained AGI that is somehow nevertheless safe. (It’s okay if they learn later that containment probably isn’t an ultimate solution to the problem.)

On the other hand, it might provoke a reaction of “What, you don’t think sentient robots have any rights, and you’re free to control and confine them in any way you please? You’re just repeating the immoral mistakes of the old slavemasters!” Which of course isn’t true, but it takes some time to explain how I can think it’s obvious that conscious machines have moral value while also being in favor of AGI control methods.

How about ethical AGI? First, I worry that it sounds too philosophical, and philosophy is widely perceived as a confused, unproductive discipline. Second, I worry that it sounds like the research assumes moral realism, which many (most?) intelligent people reject. Third, it makes it sound like most of the work is in selecting the goal function, which I don’t think is true.

What about beneficial AGI? That’s better than “ethical AGI,” I think, but like “ethical AGI” and “Friendly AI,” the term sounds less like a serious math and engineering discipline and more like some enclave of crank researchers writing a flurry of words (but no math) about how AGI needs to be “nice” and “trustworthy” and “not harmful” and oh yeah it must be “virtuous” too, whatever that means.

So yeah, I dunno. I think “AGI safety” is my least-disliked term these days, but I wish I knew of some better options.