AI Risk and Opportunity: Humanity’s Efforts So Far

Part of the series AI Risk and Opportunity: A Strategic Analysis.

(You can leave anonymous feedback on posts in this series here. I alone will read the comments, and may use them to improve past and forthcoming posts in this series.)

This post chronicles the story of humanity’s growing awareness of AI risk and opportunity, along with some recent AI safety efforts. I will not tackle any strategy questions directly in this post; my purpose today is merely to “bring everyone up to speed.”

I know my post skips many important events and people. Please suggest additions in the comments, and include as much detail as possible.

Early history

Late in the Industrial Revolution, Samuel Butler (1863) worried about what might happen when machines become more capable than the humans who designed them:

...we are ourselves creating our own successors; we are daily adding to the beauty and delicacy of their physical organisation; we are daily giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages we shall find ourselves the inferior race.

...the time will come when the machines will hold the real supremacy over the world and its inhabitants...

This basic idea was picked up by science fiction authors, for example in the 1921 Czech play that introduced the term “robot,” R.U.R. In that play, robots grow in power and intelligence and destroy the entire human race, except for a single survivor.

Another exploration of this idea is found in John W. Campbell’s (1932) short story The Last Evolution, in which aliens attack Earth and the humans and aliens are killed but their machines survive and inherit the solar system. Campbell’s (1935) short story The Machine contained perhaps the earlier description of recursive self-improvement:

On the planet Dwranl, of the star you know as Sirius, a great race lived, and they were not too unlike you humans. …they attained their goal of the machine that could think. And because it could think, they made several and put them to work, largely on scientific problems, and one of the obvious problems was how to make a better machine which could think.

The machines had logic, and they could think constantly, and because of their construction never forgot anything they thought it well to remember. So the machine which had been set the task of making a better machine advanced slowly, and as it improved itself, it advanced more and more rapidly. The Machine which came to Earth is that machine.

The concern for AI safety is most popularly identified with Isaac Asimov’s Three Laws of Robotics, introduced in his short story Runaround. Asimov used his stories, including those collected in the popular book I, Robot, to illustrate many of the ways in which such well-meaning and seemingly comprehensive rules for governing robot behavior could go wrong.

In the year of I, Robot’s release, mathematician Alan Turing (1950) noted that machines may one day be capable of whatever human intelligence can achieve:

I believe that at the end of the century… one will be able to speak of machines thinking without expecting to be contradicted.

Turing (1951) concluded:

...it seems probable that once the machine thinking method has started, it would not take long to outstrip our feeble powers… At some stage therefore we should have to expect the machines to take control...

Given the profound implications of machine intelligence, it’s rather alarming that the early AI scientists who believed AI would be built during the 1950s-1970s didn’t show much interest in AI safety. We are lucky they were wrong about the difficulty of AI — had they been right, humanity probably would not have been prepared to protect its interests.

Later, statistician I.J. Good (1959), who had worked with Turing to crack Nazi codes in World War II, reasoned that the transition from human control to machine control may be unexpectedly sudden:

Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an “explosion” will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.

The more famous formulation of this idea, and the origin of the phrase “intelligence explosion,” is from Good (1965):

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make

Good (1970) says that ”...by 1980 I hope that the implications and the safeguards [concerning machine superintelligence] will have been thoroughly discussed,” and argues that an association devoted to discussing the matter be created. Unfortunately, no such association was created until either 1991 (Extropy Institute) or 2000 (Singularity Institute), and we might say these issues have not to this day been “thoroughly” discussed.

Good (1982) proposed a plan for the design of an ethical machine:

I envisage a machine that would be given a large number of examples of human behaviour that other people called ethical, and examples of discussions of ethics, and from these examples and discussions the machine would formulate one or more consistent general theories of ethics, detailed enough so that it could deduce the probable consequences in most realistic situations.

Even critics of AI like Jack Schwartz (1987) saw the implications of intelligence that can improve itself:

If artificial intelligences can be created at all, there is little reason to believe that initial successes could not lead swiftly to the construction of artificial superintelligences able to explore significant mathematical, scientific, or engineering alternatives at a rate far exceeding human ability, or to generate plans and take action on them with equally overwhelming speed. Since man’s near-monopoly of all higher forms of intelligence has been one of the most basic facts of human existence throughout the past history of this planet, such developments would clearly create a new economics, a new sociology, and a new history.

Ray Solomonoff (1985), founder of algorithmic information theory, speculated on the implications of full-blown AI:

After we have reached [human-level AI], it shouldn’t take much more than ten years to construct ten thousand duplicates of our original [human-level AI], and have a total computing capability close to that of the computer science community...

The last 100 years have seen the introduction of special and general relatively, automobiles, airplanes, quantum mechanics, large rockets and space travel, fission power, fusion bombs, lasers, and large digital computers. Any one of these might take a person years to appreciate and understand. Suppose that they had all been presented to mankind in a single year!

Moravec (1988) argued that AI was an existential risk, but nevertheless, one toward which we must run (pp. 100-101):

...intelligent machines… threaten our existence… Machines merely as clever as human beings will have enormous advantages in competitive situations… So why rush headlong into an era of intelligent machines? The answer, I believe, is that we have very little choice, if our culture is to remain viable… The universe is one random event after another. Sooner or later an unstoppable virus deadly to humans will evolve, or a major asteroid will collide with the earth, or the sun will expand, or we will be invaded from the stars, or a black hole will swallow the galaxy. The bigger, more diverse, and competent a culture is, the better it can detect and deal with external dangers. The larger events happen less frequently. By growing rapidly enough, a culture has a finite chance of surviving forever.

Ray Kurzweil’s The Age of Intelligent Machines (1990) did not mention AI risk, and his followup, The Age of Spiritual Machines (1998) does so only briefly, in an “interview” between the reader and Kurzweil. The reader asks, “So we risk the survival of the human race for [the opportunity AI affords us to expand our minds and advance our ability to create knowledge]?” Kurzweil answers: “Yeah, basically.”

Minsky (1984) pointed out the difficulty of getting machines to do what we want:

...it is always dangerous to try to relieve ourselves of the responsibility of understanding exactly how our wishes will be realized. Whenever we leave the choice of means to any servants we may choose then the greater the range of possible methods we leave to those servants, the more we expose ourselves to accidents and incidents. When we delegate those responsibilities, then we may not realize, before it is too late to turn back, that our goals have been misinterpreted, perhaps even maliciously. We see this in such classic tales of fate as Faust, the Sorcerer’s Apprentice, or the Monkey’s Paw by W.W. Jacobs.

[Another] risk is exposure to the consequences of self-deception. It is always tempting to say to oneself… that “I know what I would like to happen, but I can’t quite express it clearly enough.” However, that concept itself reflects a too-simplistic self-image, which portrays one’s own self as [having] well-defined wishes, intentions, and goals. This pre-Freudian image serves to excuse our frequent appearances of ambivalence; we convince ourselves that clarifying our intentions is merely a matter of straightening-out the input-output channels between our inner and outer selves. The trouble is, we simply aren’t made that way. Our goals themselves are ambiguous.

The ultimate risk comes when [we] attempt to take that final step — of designing goal-achieving programs that are programmed to make themselves grow increasingly powerful, by self-evolving methods that augment and enhance their own capabilities. It will be tempting to do this, both to gain power and to decrease our own effort toward clarifying our own desires. If some genie offered you three wishes, would not your first one be, “Tell me, please, what is it that I want the most!” The problem is that, with such powerful machines, it would require but the slightest accident of careless design for them to place their goals ahead of [ours]. The machine’s goals may be allegedly benevolent, as with the robots of With Folded Hands, by Jack Williamson, whose explicit purpose was allegedly benevolent: to protect us from harming ourselves, or as with the robot in Colossus, by D.H.Jones, who itself decides, at whatever cost, to save us from an unsuspected enemy. In the case of Arthur C. Clarke’s HAL, the machine decides that the mission we have assigned to it is one we cannot properly appreciate. And in Vernor Vinge’s computer-game fantasy, True Names, the dreaded Mailman… evolves new ambitions of its own.

The Modern Era

Novelist Vernor Vinge (1993) popularized Good’s “intelligence explosion” concept, and wrote the first novel about self-improving AI posing an existential threat: A Fire Upon the Deep (1992). It was probably Vinge who did more than anyone else to spur discussions about AI risk, particularly in online communities like the extropians mailing list (since 1991) and SL4 (since 2000). Participants in these early discussions included several of today’s leading thinkers on AI risk: Robin Hanson, Eliezer Yudkowsky, Nick Bostrom, Anders Sandberg, and Ben Goertzel. (Other posters included Peter Thiel, FM-2030, Robert Bradbury, and Julian Assange.) Proposals like Friendly AI, Oracle AI, and Nanny AI were discussed here long before they were brought to greater prominence with academic publications (Yudkowsky 2008; Armstrong et al. 2012; Goertzel 2012).

Meanwhile, philosophers and AI researchers considered whether or not machines could have moral value, and how to ensure ethical behavior from less powerful machines or ‘narrow AIs’, a field of inquiry variously known as ‘artificial morality’ (Danielson 1992; Floridi & Sanders 2004; Allen et al. 2000), ‘machine ethics’ (Hall 2000; McLaren 2005; Anderson & Anderson 2006), ‘computational ethics’ (Allen 2002) and ‘computational metaethics’ (Lokhorst 2011), and ‘robo-ethics’ or ‘robot ethics’ (Capurro et al. 2006; Sawyer 2007). This vein of research — what I’ll call the ‘machine ethics’ literature — was recently summarized in two books: Wallach & Allen (2009); Anderson & Anderson (2011). Thus far, there has been a significant communication gap between the machine ethics literature and the AI risk literature (Allen and Wallach 2011), excepting perhaps Muehlhauser and Helm (2012).

The topic of AI safety in the context of existential risk was left to the futurists who had participated in online discusses of AI risk and opportunity. Here, I must cut short my review and focus on just three (of many) important figures: Eliezer Yudkowksy, Robin Hanson, and Nick Bostrom. (Your author also apologizes for the fact that, because he works with Yudkowsky, Yudkowsky gets a more detailed treatment here than Hanson or Bostrom.)

Other figures in the modern era of AI risk research include Bill Hibbard (Super-Intelligent Machines) and Ben Goertzel (“Should Humanity Build a Global AI Nanny to Delay the Singularity Until It’s Better Understood”).

Eliezer Yudkowsky

According to “Eliezer, the person,” Eliezer Yudkowsky (born 1979) was a bright kid — in the 99.9998th percentile of cognitive ability, according to the Midwest Talent Search. He read lots of science fiction as a child, and at age 11 read Great Mambo Chicken and the Transhuman Condition — his introduction to the impending reality of transhumanist technologies like AI and nanotech. The moment he became a Singularitarian was the moment he read page 47 of True Names and Other Dangers by Vernor Vinge:

Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It’s a problem we face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity—a place where extrapolation breaks down and new models must be applied—and the world will pass beyond our understanding.

Yudkowsky reported his reaction:

My emotions at that moment are hard to describe; not fanaticism, or enthusiasm, just a vast feeling of “Yep. He’s right.” I knew, in the moment I read that sentence, that this was how I would be spending the rest of my life.

(As an aside, I’ll note that this is eerily similar to my own experience of encountering the famous I.J. Good paragraph about ultraintelligence (quoted above), before I knew what “transhumanism” or “the Singularity” was. I read Good’s paragraph and thought, “Wow. That’s… probably correct. How could I have missed that implication? … … … Well, shit. That changes everything.”)

As a teenager in the mid 1990s, Yudkowsky participated heavily in Singularitarian discussions on the extropians mailing list, and in 1996 (at age 17) he wrote “Staring into the Singularity,” which gained him much attention, as did his popular “FAQ about the Meaning of Life” (1999).

In 1998 Yudkowsky was invited (along with 33 others) by economist Robin Hanson to comment on Vinge (1993). Thirteen people (including Yudkowsky) left comments, then Vinge responded, and a final open discussion was held on the extropians mailing list. Hanson edited together these results here. Yudkowsky thought Max More’s comments on Vinge underestimated how different from humans AI would probably be, and this prompted Yudkowsky to begin an early draft of “Coding a Transhuman AI” (CaTAI) which by 2000 had grown into the first large explication of his thoughts on “Seed AI” and “friendly” machine superintelligence (Yudkowsky 2000).

Around this same time, Yudkowsky wrote “The Plan to the Singularity” and “The Singularitarian Principles,” and launched the SL4 mailing list.

At a May 2000 gathering hosted by the Foresight Institute, Brian Atkins and Sabine Stoeckel discussed with Yudkowsky the possibility of launching an organization specializing in AI safety. In July of that year, Yudkowsky formed the Singularity Institute and began his full-time research on the problems of AI risk and opportunity.

In 2001, he published two “sequels” to CaTAI, “General Intelligence and Seed AI” and, most importantly, “Creating Friendly AI” (CFAI) (Yudkowsky 2001).

The publication of CFAI was a significant event, prompting Ben Goertzel (the pioneer of the new Artificial General Intelligence research community) to say that “Creating Friendly AI is the most intelligent writing about AI that I’ve read in many years,” and prompting Eric Drexler (the pioneer of molecular manufacturing) to write that “With Creating Friendly AI, the Singularity Institute has begun to fill in one of the greatest remaining blank spots in the picture of humanity’s future.”

CFAI was both frustrating and brilliant. It was frustrating because: (1) it was disorganized and opaque, (2) it invented new terms instead of using the terms being used by everyone else, for example speaking of “supergoals” and “subgoals” instead of final and instrumental goals, and speaking of goal systems but never “utility functions,” and (3) it hardly cited any of the relevant works in AI, philosophy, and psychology — for example it could have cited McCulloch (1952), Good (1959, 1970, 1982), Cade (1966), Versenyi (1974), Evans (1979), Lampson (1979), the conversation with Ed Fredkin in McCorduck (1979), Sloman (1984), Schmidhuber (1987), Waldrop (1987), Pearl (1989), De Landa (1991), Crevier (1993, ch. 12), Clarke (1993, 1994), Weld & Etzioni (1994), Buss (1995), Russell & Norvig (1995), Gips (1995), Whitby (1996), Schmidhuber et al. (1997), Barto & Sutton (1998), Jackson (1998), Levitt (1999), Moravec (1999), Kurzweil (1999), Sobel (1999), Allen et al. (2000), Gordon (2000), Harper (2000), Coleman 2001, and Hutter (2001). These features still substantially characterize Yudkowsky’s independent writing, e.g. see Yudkowsky (2010). As late as January 2006, he still wrote that “It is not that I have neglected to cite the existing major works on this topic, but that, to the best of my ability to discern, there are no existing major works to cite.”

On the other hand, CFAI was in many ways was brilliant, and it tackled many of the problems left mostly untouched by mainstream machine ethics researchers. For example, CFAI (but not the mainstream machine ethics literature) engaged the problems of: (1) radically self-improving AI, (2) AI as an existential risk, (3) hard takeoff, (4) the interplay of goal content, acquisition, and structure, (5) wireheading, (6) subgoal stomp, (7) external reference semantics, (8) causal validity semantics, and (9) selective support (which Bostrom (2002) would later call “differential technological development”).

For many years, the Singularity Institute was little more than a vehicle for Yudkowsky’s research. In 2002 he wrote “Levels of Organization in General Intelligence,” which later appeared in the first edited volume on Artificial General Intelligence (AGI). In 2003 he wrote what would become the internet’s most popular tutorial on Bayes’ Theorem, followed in 2005 by “A Technical Explanation of Technical Explanation.” In 2004 he explained his vision of a Friendly AI goal structure: “Coherent Extrapolated Volition.” In 2006 he wrote two chapters that would later appear in the volume Global Catastrohpic Risks volume from Oxford University Press (co-edited by Bostrom): “Cognitive Biases Potentially Affecting Judgment of Global Risks” and, what remains his “classic” article on the need for Friendly AI, “Artificial Intelligence as a Positive and Negative Factor in Global Risk.

In 2004, Tyler Emerson was hired as the Singularity Institute’s executive director. Emerson brought on Nick Bostrom (then a post doctoral fellow at Yale), Christine Peterson (of the Foresight Institute), and others, as advisors. In February 2006, Paypal co-founder Peter Thiel donated $100,000 to the Singularity Institute, and, we might say, the Singularity Institute as we know it today was born.

From 2005-2007, Yudkowsky worked at various times with Marcello Herreshoff, Nick Hay and Peter de Blanc on the technical problems of AGI necessary for technical FAI work, for example creating AIXI-like architectures, developing a reflective decision theory, and investigating limits inherent in self-reflection due to Löb’s Theorem. Almost none of this research has been published, in part because of the desire not to accelerate AGI research without having made corresponding safety progress. (Marcello also worked with Eliezer during the summer of 2009.)

Much of the Singularity Institute’s work has been “movement-building” work. The institute’s Singularity Summit, held annually since 2006, attracts technologists, futurists, and social entrepreneurs from around the world, bringing to their attention not only emerging and future technologies but also the basics of AI risk and opportunity. The Singularity Summit also gave the Singularity Institute much of its access to cultural, academic, and business elites.

Another key piece of movement-building work was Yudkowsky’s “The Sequences,” which were written during 2006-2009. Yudkowsky blogged, almost daily, on the subjects of epistemology, language, cognitive biases, decision-making, quantum mechanics, metaethics, and artificial intelligence. These posts were originally published on a community blog about rationality, Overcoming Bias (which later became Hanson’s personal blog). Later, Yudkowsky’s posts were used as the seed material for a new group blog, Less Wrong.

Yudkowsky’s goal was to create a community of people who could avoid common thinking mistakes, change their minds in response to evidence, and generally think and act with an unusual degree of Technical Rationality. In CFAI he had pointed out that when it comes to AI, humanity may not have a second chance to get it right. So we can’t run a series of intelligence explosion experiments and “see what works.” Instead, we need to predict in advance what we need to do to ensure a desirable future, and we need to overcome common thinking errors when doing so. (Later, Yudkowsky expanded his “community of rationalists” by writing the most popular Harry Potter fanfiction in the world, Harry Potter and the Methods of Rationality, and is currently helping to launch a new organization that will teach classes on the skills of rational thought and action.)

This community demonstrated its usefulness in 2009 when Yudkowsky began blogging about some problems in decision theory related to the project of building a Friendly AI. Much like Tim Gowers’ Polymath Project, these discussions demonstrated the power of collaborative problem-solving over the internet. The discussions led to a decision theory workshop and then a decision theory mailing list, which quickly became home to some of the most interesting work in decision theory anywhere in the world. Yudkowsky summarized some of his earlier results in “Timeless Decision Theory” (2010), and newer results have been posted to Less Wrong, for example A model of UDT with a halting oracle and Formulas of arithmetic that behave like decision agents.

The Singularity Institute also built its community with a Visiting Fellows program that hosted groups of researchers for 1-3 months at a time. Together, both visiting fellows and newly hired research fellows produced several working papers between 2009 and 2011, including Machine Ethics and Superintelligence, Implications of a Software-Limited Singularity, Economic Implications of Software Minds, Convergence of Expected Utility for Universal AI, and Ontological Crises in Artificial Agents’ Value Systems.

In 2011, then-president Michael Vassar left the Singularity Institute to help launch a personalized medicine company, and research fellow Luke Muehlhauser (the author of this document) took over leadership from Vassar, as Executive Director. During this time, the Institute underwent a major overhaul to implement best practices for organizational process and management: it published its first strategic plan, began to maintain its first donor database, adopted best practices for accounting and bookkeeping, updated its bylaws and articles of incorporation, adopted more standard roles for the Board of Directors and the Executive Director, held a series of strategic meetings to help decide the near-term goals of the organization, began to publish monthly progress reports to its blog, started outsourcing more work, and began to work on more articles for peer-reviewed publications: as of March 2012, the Singularity Institute has more peer-reviewed publications forthcoming in 2012 than it had published in all of 2001-2011 combined.

Today, the Singularity Institute collaborates regularly with its (non-staff) research associates, and also with researchers at the Future of Humanity Institute at Oxford University (directed by Bostrom), which as of March 2012 is the world’s only other major research institute largely focused on the problems of existential risk.

Robin Hanson

Whereas Yudkowsky has never worked in the for-profit world and had no formal education after high school, Robin Hanson (born 1959) has a long and prestigious academic and professional history. Hanson took a B.S. in physics from U.C. Irvine in 1981, took an M.S. in physics and an M.A. in the conceptual foundations of science from U. Chicago in 1984, worked in artificial intelligence for Lockheed and NASA, got a Ph.D. in social science from Caltech in 1997, did a post-doctoral fellowship at U.C. Berkeley in Health policy from 1997-1999, and finally was made an assistant professor of economics at George Mason University in 1999. In economics, he is best known for conceiving of prediction markets.

When Hanson moved to California in 1984, he encountered the Project Xanadu crowd and met Eric Drexler, who showed him an early draft of Engines of Creation. This community discussed AI, nanotech, cryonics, and other transhumanist topics, and Hanson joined the extropians mailing list (along with many others from Project Xanadu) when it launched in 1991.

Hanson has published several papers on the economics of whole brain emulations (what he calls “ems”) and AI (1994, 1998a, 1998b, 2008a, 2008b, 2008c, 2012a). His writings at Overcoming Bias (launched November 2006) are perhaps even more influential, and cover a wide range of topics.

Hanson’s views on AI risk and opportunity differ from Yudkowsky’s. First, Hanson sees the technological singularity and the human-machine conflict it may produce not as a unique event caused by the advent of AI, but as a natural consequence of “the general fact that accelerating rates of change increase intergenerational conflicts” (Hanson 2012b). Second, Hanson thinks an intelligence explosion will be slower and more gradual than Yudkowsky does, denying Yudkowsky’s “hard takeoff” thesis (Hanson & Yudkowsky 2008).

Nick Bostrom

Nick Bostrom (born 1973) received a B.S. in philosophy, mathematics, mathematical logic, and artificial intelligence from the University of Goteborg in 1994, setting a national record in Sweden for undergraduate academic performance. He received an M.A. in philosophy and physics from from U. Stockholm in 1996, did work in astrophysics and computational neuroscience at King’s College London, and received his Ph.D. from the London School of Economics in 2000. He went on to be a post-doctoral fellow at Yale University and in 2005 became the founding director of Oxford University’s Future of Humanity Institute (FHI). Without leaving FHI, he became the founding director of Oxford’s Programme on the Impacts of Future Technology (aka FutureTech) in 2011.

Bostrom had long been interested in cognitive enhancement, and in 1995 he joined the extropians mailing list and learned about cryonics, uploading, AI, and other topics.

Bostrom worked with British philosopher David Pearce) to found the World Transhumanist Association (now called H+) in 1998, with the purpose of developing a more mature and academically respectable form of transhumanism than was usually present on the extropians mailing list. During this time Bostrom wrote “The Transhumanist FAQ” (now updated to version 2.1), with input from more than 50 others.

His first philosophical publication was “Predictions from Philosophy? How philosophers could make themselves useful” (1997). In this paper, Bostrom proposed “a new type of philosophy, a philosophy whose aim is prediction.” On Bostrom’s view, one role for the philosopher is to be a polymath who can engage in technological prediction and try to figure out how to steer the future so that humanity’s goals are best met.

Bostrom gave three examples of problems this new breed of philosopher-polymath could tackle: the Doomsday argument and anthropics, the Fermi paradox, and superintelligence:

What questions could a philosophy of superintelligence deal with? Well, questions like: How much would the predictive power for various fields increase if we increase the processing speed of a human-like mind a million times? If we extend the short-term or long-term memory? If we increase the neural population and the connection density? What other capacities would a superintelligence have? How easy would it be for it to rediscover the greatest human inventions, and how much input would it need to do so? What is the relative importance of data, theory, and intellectual capacity in various disciplines? Can we know anything about the motivation of a superintelligence? Would it be feasible to preprogram it to be good or philanthropic, or would such rules be hard to reconcile with the flexibility of its cognitive processes? Would a superintelligence, given the desire to do so, be able to outwit humans into promoting its own aims even if we had originally taken strict precautions to avoid being manipulated? Could one use one superintelligence to control another? How would superintelligences communicate with each other? Would they have thoughts which were of a totally different kind from the thoughts that humans can think? Would they be interested in art and religion? Would all superintelligences arrive at more or less the same conclusions regarding all important scientific and philosophical questions, or would they disagree as much as humans do? And how similar in their internal belief-structures would they be? How would our human self-perception and aspirations change if were forced to abdicate the throne of wisdom...? How would we individuate between superminds if they could communicate and fuse and subdivide with enormous speed? Will a notion of personal identity still apply to such interconnected minds? Would they construct an artificial reality in which to live? Could we upload ourselves into that reality? Could we then be able to compete with the superintelligences, if we were accelerated and augmented with extra memory etc., or would such profound reorganisation be necessary that we would no longer feel we were humans? Would that matter?

Bostrom went on to examine some philosophical issues related to superintelligence, in “Predictions from Philosophy” and in “How Long Before Superintelligence?” (1998), “Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards” (2002), “Ethical Issues in Advanced Artificial Intelligence” (2003), “The Future of Human Evolution” (2004), and “The Ethics of Artificial Intelligence” (2012, coauthored with Yudkowsky). (He also played out the role of philosopher-polymath with regard to several other topics, including human enhancement and anthropic bias.)

Bostrom’s industriousness paid off:

In 2009, [Bostrom] was awarded the Eugene R. Gannon Award (one person selected annually worldwide from the fields of philosophy, mathematics, the arts and other humanities, and the natural sciences). He has been listed in the FP 100 Global Thinkers list, the Foreign Policy Magazineʹs list of the worldʹs top 100 minds. His writings have been translated into more than 21 languages, and there have been some 80 translations or reprints of his works. He has done more than 470 interviews for TV, film, radio, and print media, and he has addressed academic and popular audiences around the world.

The other long-term member of the Future of Humanity Institute, Anders Sandberg, has also published some research on AI risk. Sandberg was a co-author on the whole brain emulation roadmap and “Anthropic Shadow”, and also wrote “Models of the Technological Singularity” and several other papers.

Recently, Bostrom and Sandberg were joined by Stuart Armstrong, who wrote “Anthropic Decision Theory” (2011) and was the lead author on “Thinking Inside the Box: Using and Controlling Oracle AI” (2012). He had previously written Chaining God (2007).

For more than a year, Bostrom has been working on a new book titled Superintelligence: A Strategic Analysis of the Coming Machine Intelligence Revolution, which aims to sum up and organize much of the (published and unpublished) work done in the past decade by researchers at the Singularity Institute and FHI on the subject of AI risk and opportunity, as well as contribute new insights.

AI Risk Goes Mainstream

In 1997, professor of cybernetics Kevin Warwick published March of the Machines, in which he predicted that within a couple decades, machines would become more intelligent than humans, and would pose an existential threat.

In 2000, Sun Microsystems co-founder Bill Joy published “Why the Future Doesn’t Need Us” in Wired magazine. In this widely-circulated essay, Joy argued that “Our most powerful 21st-century technologies — robotics, genetic engineering, and nanotech — are threatening to make humans an endangered species.” Joy advised that we relinquish development of these technologies rather than sprinting headlong into an arms race between destructive uses of these technologies and defenses against those destructive uses.

Many people dismissed Bill Joy as a “Neo-Luddite,” but many experts expressed similar concerns about human extinction, including philosopher John Leslie (The End of the World), physicist Martin Rees (Our Final Hour), legal theorist Richard Posner (Catastrophe: Risk and Response), and the contributors to Global Catastrophic Risks (including Yudkowsky, Hanson, and Bostrom).

Even Ray Kurzweil, known as an optimist about technology, devoted a chapter of his 2005 bestseller The Singularity is Near to a discussion of existential risks, including risks from AI. Though discussing the possibility of existential catastrophe at length, his take on AI risk was cursory (p. 420):

Inherently there will be no absolute protection against strong AI. Although the argument is subtle I believe that maintaining an open free-market system for incremental scientific and technological progress, in which each step is subject to market acceptance, will provide the most constructive environment for technology to embody widespread human values. As I have pointed out, strong AI is emerging from many diverse efforts and will be deeply integrated into our civilization’s infrastructure. Indeed, it will be intimately embedded in our bodies and brains. As such, it will reflect our values because it will be us.

AI risk finally became a “mainstream” topic in analytic philosophy with Chalmers (2010) and an entire issue of Journal of Consciousness Studies devoted to the topic.

The earliest popular discussion of machine superintelligence may have been in Christopher Evans’ international bestseller The Mighty Micro (1979), pages 194-198, 231-233, and 237-246.

The Current Situation

Two decades have passed since the early transhumanists began to seriously discuss AI risk and opportunity on the extropians mailing list. (Before that, some discussions took place at the MIT AI lab, but that was before the web was popular, so they weren’t recorded.) What have we humans done since then?

Lots of talking. Hundreds of thousands of man-hours have been invested into discussions on the extropians mailing list, SL4, Overcoming Bias, Less Wrong, the Singularity Institute’s decision theory mailing list, several other internet forums, and also in meat-space (especially in the Bay Area near the Singularity Institute and in Oxford near FHI). These are difficult issues; talking them through is usually the first step to getting anything else done.

Organization. Mailing lists are a form of organization, as are organizations like The Singularity Institute and university departments like the FHI and FutureTech. Established organizations provide opportunities to bring people together, and to pool and direct resources efficiently.

Resources. Many people of considerable wealth, along with thousands of others of “concerned citizens” around the world, have decided that AI is the most significant risk and opportunity we face, and are willing to invest in humanity’s future.

Outreach. Publications (both academic and popular), talks, and interactions with major and minor media outlets have been used to raise awareness of AI risk and opportunity. This has included outreach to specific AGI researchers, some of whom now take AI safety quite seriously. This also includes outreach to people in positions of influence who are in a position to engage in differential technological development. It also includes outreach to the rapidly growing “optimal philanthropy” community; a large fraction of those associated with Giving What We Can take existential risk — and AI risk in particular — quite seriously.

Research. So far, most research on the topic has been concerned with trying to become less confused about what, exactly, the problem is, how worried we should be, and which strategic actions we should take. How do we predict technological progress? How can we predict AI outcomes? Which interventions, taken now, would probably increase the odds of positive AI outcomes? There has also been some “technical” research in decision theory (e.g. TDT, UDT, ADT), the math of AI goal systems (“Learning What to Value”,” “Ontological Crises in Artificial Agents’ Value Systems,” “Convergence of Expected Utility for Universal AI”), and Yudkowsky’s unpublished research on Friendly AI.

Muehlhauser 2011 provides an overview of the categories of research problems we have left to solve. Most of the known problems aren’t even well-defined at this point.

References