FAI FAQ draft: What is the history of the Friendly AI concept?

I invite your feedback on this snippet from the forthcoming Friendly AI FAQ. This one is an answer to the question “What is the history of the Friendly AI concept?”

_____

Late in the Industrial Revolution, Samuel Butler (1863) worried about what might happen when machines become more capable than the humans who designed them:

...we are ourselves creating our own successors; we are daily adding to the beauty and delicacy of their physical organisation; we are daily giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages we shall find ourselves the inferior race.

...the time will come when the machines will hold the real supremacy over the world and its inhabitants...

This basic idea was picked up by science fiction authors, for example in John W. Campbell’s (1932) short story The Last Evolution. In the story, humans live lives of leisure because machines are smart enough to do all the work. One day, aliens invade:

Then came the Outsiders. Whence they came, neither machine nor man ever learned, save only that they came from beyond the outermost planet, from some other sun. Sirius—Alpha Centauri—perhaps! First a thin scoutline of a hundred great ships, mighty torpedoes of the void [3.5 miles] in length, they came.

Earth’s machines, protecting humans, defeat the aliens. The aliens’ machines survive long enough to render humans extinct, but are eventually defeated by Earth’s machines. These machines inherit the solar system, eventually moving to run on substrates of pure “Force.”

The concerns of machine ethics are most popularly identified with Isaac Asimov’s Three Laws of Robotics, introduced in his short story Runaround. Asimov used his stories, including those collected in the popular I, Robot book, to illustrate all the ways in which such simple rules for governing robot behavior could go wrong.

In the year of I, Robot’s release, mathematician Alan Turing (1950) noted that machines will one day be capable of genuine thought:

I believe that at the end of the century… one will be able to speak of machines thinking without expecting to be contradicted.

Turing (1951/​2004) concluded:

...it seems probable that once the machine thinking method has started, it would not take long to outstrip our feeble powers… At some stage therefore we should have to expect the machines to take control...

Bayesian statistician I.J. Good (1965), who had worked with Turing to crack Nazi codes in World War II, made the crucial leap to the ‘intelligence explosion’ concept:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion”, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make

Futurist Arthur C. Clarke (1968) agreed:

Though we have to live and work with (and against) today’s mechanical morons, their deficiencies should not blind us to the future. In particular, it should be realized that as soon as the borders of electronic intelligence are passed, there will be a kind of chain reaction, because the machines will rapidly improve themselves… there will be a mental explosion; the merely intelligent machine will swiftly give way to the ultraintelligent machine....

Perhaps our role on this planet is not to worship God but to create Him.

Julius Lukasiewicz (1974) noted that human intelligence may be unable to predict what a superintelligent machine would do:

The survival of man may depend on the early construction of an ultraintelligent machine-or the ultraintelligent machine may take over and render the human race redundant or develop another form of life. The prospect that a merely intelligent man could ever attempt to predict the impact of an ultraintelligent device is of course unlikely but the temptation to speculate seems irresistible.

Even critics of AI like Jack Schwartz (1987) saw the implications:

If artificial intelligences can be created at all, there is little reason to believe that initial successes could not lead swiftly to the construction of artificial superintelligences able to explore significant mathematical, scientific, or engineering alternatives at a rate far exceeding human ability, or to generate plans and take action on them with equally overwhelming speed. Since man’s near-monopoly of all higher forms of intelligence has been one of the most basic facts of human existence throughout the past history of this planet, such developments would clearly create a new economics, a new sociology, and a new history.

Novelist Vernor Vinge (1981) called this ‘event horizon’ in our ability to predict the future a ‘singularity’:

Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It’s a problem we face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity—a place where extrapolation breaks down and new models must be applied—and the world will pass beyond our understanding.

Eliezer Yudkowsky (1996) used the term ‘singularity’ to refer instead to Good’s ‘intelligence explosion’, and began work on the task of figuring out how to build a self-improving AI that had a positive rather than negative effect on the world (Yudkowsky 2000) — a project he eventually called ‘Friendly AI’ (Yudkowsky 2001).

Meanwhile, philosophers and AI researchers were considering whether or not machines could have moral value, and how to ensure ethical behavior from less powerful machines or ‘narrow AIs’, a field of inquiry variously known as ‘artificial morality’ (Danielson 1992; Floridi & Sanders 2004; Allen et al. 2000), ‘machine ethics’ (Hall 2000; McLaren 2005; Anderson & Anderson 2006), ‘computational ethics’ (Allen 2002) and ‘computational metaethics’ (Lokhorst, 2011), and ‘robo-ethics’ or ‘robot ethics’ (Capurro et al. 2006; Sawyer 2007). This vein of research — what we’ll call the ‘machine ethics’ literature — was recently summarized in two books: Wallach & Allen (2009); Anderson & Anderson (2011).

Leading philosopher of mind David Chalmers brought the concepts of intelligence explosion and Friendly AI to mainstream academic attention with his 2010 paper, ‘The Singularity: A Philosophical Analysis’, published in Journal of Consciousness Studies. That journal’s January 2012 issue will be devoted to responses to Chalmers’ article, as will an edited volume from Springer (Eden et al. 2012).

Friendly AI researchers do not regularly cite the machine ethics literature (e.g. see Bostrom & Yudkowsky 2011). These researchers have put forward preliminary proposals for ensuring ethical behavior in superintelligent or self-improving machines, for example ‘Coherent Extrapolated Volition’ (Yudkowsky 2004).