AI Risk and Opportunity: Humanity’s Efforts So Far

Part of the se­ries AI Risk and Op­por­tu­nity: A Strate­gic Anal­y­sis.

(You can leave anony­mous feed­back on posts in this se­ries here. I alone will read the com­ments, and may use them to im­prove past and forth­com­ing posts in this se­ries.)

This post chron­i­cles the story of hu­man­ity’s grow­ing aware­ness of AI risk and op­por­tu­nity, along with some re­cent AI safety efforts. I will not tackle any strat­egy ques­tions di­rectly in this post; my pur­pose to­day is merely to “bring ev­ery­one up to speed.”

I know my post skips many im­por­tant events and peo­ple. Please sug­gest ad­di­tions in the com­ments, and in­clude as much de­tail as pos­si­ble.

Early history

Late in the In­dus­trial Revolu­tion, Sa­muel But­ler (1863) wor­ried about what might hap­pen when ma­chines be­come more ca­pa­ble than the hu­mans who de­signed them:

...we are our­selves cre­at­ing our own suc­ces­sors; we are daily adding to the beauty and del­i­cacy of their phys­i­cal or­gani­sa­tion; we are daily giv­ing them greater power and sup­ply­ing by all sorts of in­ge­nious con­trivances that self-reg­u­lat­ing, self-act­ing power which will be to them what in­tel­lect has been to the hu­man race. In the course of ages we shall find our­selves the in­fe­rior race.

...the time will come when the ma­chines will hold the real supremacy over the world and its in­hab­itants...

This ba­sic idea was picked up by sci­ence fic­tion au­thors, for ex­am­ple in the 1921 Czech play that in­tro­duced the term “robot,” R.U.R. In that play, robots grow in power and in­tel­li­gence and de­stroy the en­tire hu­man race, ex­cept for a sin­gle sur­vivor.

Another ex­plo­ra­tion of this idea is found in John W. Camp­bell’s (1932) short story The Last Evolu­tion, in which aliens at­tack Earth and the hu­mans and aliens are kil­led but their ma­chines sur­vive and in­herit the so­lar sys­tem. Camp­bell’s (1935) short story The Ma­chine con­tained per­haps the ear­lier de­scrip­tion of re­cur­sive self-im­prove­ment:

On the planet Dwranl, of the star you know as Sirius, a great race lived, and they were not too un­like you hu­mans. …they at­tained their goal of the ma­chine that could think. And be­cause it could think, they made sev­eral and put them to work, largely on sci­en­tific prob­lems, and one of the ob­vi­ous prob­lems was how to make a bet­ter ma­chine which could think.

The ma­chines had logic, and they could think con­stantly, and be­cause of their con­struc­tion never for­got any­thing they thought it well to re­mem­ber. So the ma­chine which had been set the task of mak­ing a bet­ter ma­chine ad­vanced slowly, and as it im­proved it­self, it ad­vanced more and more rapidly. The Ma­chine which came to Earth is that ma­chine.

The con­cern for AI safety is most pop­u­larly iden­ti­fied with Isaac Asi­mov’s Three Laws of Robotics, in­tro­duced in his short story Ru­naround. Asi­mov used his sto­ries, in­clud­ing those col­lected in the pop­u­lar book I, Robot, to illus­trate many of the ways in which such well-mean­ing and seem­ingly com­pre­hen­sive rules for gov­ern­ing robot be­hav­ior could go wrong.

In the year of I, Robot’s re­lease, math­e­mat­i­cian Alan Tur­ing (1950) noted that ma­chines may one day be ca­pa­ble of what­ever hu­man in­tel­li­gence can achieve:

I be­lieve that at the end of the cen­tury… one will be able to speak of ma­chines think­ing with­out ex­pect­ing to be con­tra­dicted.

Tur­ing (1951) con­cluded: seems prob­a­ble that once the ma­chine think­ing method has started, it would not take long to out­strip our fee­ble pow­ers… At some stage there­fore we should have to ex­pect the ma­chines to take con­trol...

Given the profound im­pli­ca­tions of ma­chine in­tel­li­gence, it’s rather alarm­ing that the early AI sci­en­tists who be­lieved AI would be built dur­ing the 1950s-1970s didn’t show much in­ter­est in AI safety. We are lucky they were wrong about the difficulty of AI — had they been right, hu­man­ity prob­a­bly would not have been pre­pared to pro­tect its in­ter­ests.

Later, statis­ti­cian I.J. Good (1959), who had worked with Tur­ing to crack Nazi codes in World War II, rea­soned that the tran­si­tion from hu­man con­trol to ma­chine con­trol may be un­ex­pect­edly sud­den:

Once a ma­chine is de­signed that is good enough… it can be put to work de­sign­ing an even bet­ter ma­chine. At this point an “ex­plo­sion” will clearly oc­cur; all the prob­lems of sci­ence and tech­nol­ogy will be handed over to ma­chines and it will no longer be nec­es­sary for peo­ple to work. Whether this will lead to a Utopia or to the ex­ter­mi­na­tion of the hu­man race will de­pend on how the prob­lem is han­dled by the ma­chines. The im­por­tant thing will be to give them the aim of serv­ing hu­man be­ings.

The more fa­mous for­mu­la­tion of this idea, and the ori­gin of the phrase “in­tel­li­gence ex­plo­sion,” is from Good (1965):

Let an ul­train­tel­li­gent ma­chine be defined as a ma­chine that can far sur­pass all the in­tel­lec­tual ac­tivi­ties of any man how­ever clever. Since the de­sign of ma­chines is one of these in­tel­lec­tual ac­tivi­ties, an ul­train­tel­li­gent ma­chine could de­sign even bet­ter ma­chines; there would then un­ques­tion­ably be an “in­tel­li­gence ex­plo­sion,” and the in­tel­li­gence of man would be left far be­hind. Thus the first ul­train­tel­li­gent ma­chine is the last in­ven­tion that man need ever make

Good (1970) says that ” 1980 I hope that the im­pli­ca­tions and the safe­guards [con­cern­ing ma­chine su­per­in­tel­li­gence] will have been thor­oughly dis­cussed,” and ar­gues that an as­so­ci­a­tion de­voted to dis­cussing the mat­ter be cre­ated. Un­for­tu­nately, no such as­so­ci­a­tion was cre­ated un­til ei­ther 1991 (Ex­tropy In­sti­tute) or 2000 (Sin­gu­lar­ity In­sti­tute), and we might say these is­sues have not to this day been “thor­oughly” dis­cussed.

Good (1982) pro­posed a plan for the de­sign of an eth­i­cal ma­chine:

I en­visage a ma­chine that would be given a large num­ber of ex­am­ples of hu­man be­havi­our that other peo­ple called eth­i­cal, and ex­am­ples of dis­cus­sions of ethics, and from these ex­am­ples and dis­cus­sions the ma­chine would for­mu­late one or more con­sis­tent gen­eral the­o­ries of ethics, de­tailed enough so that it could de­duce the prob­a­ble con­se­quences in most re­al­is­tic situ­a­tions.

Even crit­ics of AI like Jack Schwartz (1987) saw the im­pli­ca­tions of in­tel­li­gence that can im­prove it­self:

If ar­tifi­cial in­tel­li­gences can be cre­ated at all, there is lit­tle rea­son to be­lieve that ini­tial suc­cesses could not lead swiftly to the con­struc­tion of ar­tifi­cial su­per­in­tel­li­gences able to ex­plore sig­nifi­cant math­e­mat­i­cal, sci­en­tific, or en­g­ineer­ing al­ter­na­tives at a rate far ex­ceed­ing hu­man abil­ity, or to gen­er­ate plans and take ac­tion on them with equally over­whelming speed. Since man’s near-monopoly of all higher forms of in­tel­li­gence has been one of the most ba­sic facts of hu­man ex­is­tence through­out the past his­tory of this planet, such de­vel­op­ments would clearly cre­ate a new eco­nomics, a new so­ciol­ogy, and a new his­tory.

Ray Solomonoff (1985), founder of al­gorith­mic in­for­ma­tion the­ory, spec­u­lated on the im­pli­ca­tions of full-blown AI:

After we have reached [hu­man-level AI], it shouldn’t take much more than ten years to con­struct ten thou­sand du­pli­cates of our origi­nal [hu­man-level AI], and have a to­tal com­put­ing ca­pa­bil­ity close to that of the com­puter sci­ence com­mu­nity...

The last 100 years have seen the in­tro­duc­tion of spe­cial and gen­eral rel­a­tively, au­to­mo­biles, air­planes, quan­tum me­chan­ics, large rock­ets and space travel, fis­sion power, fu­sion bombs, lasers, and large digi­tal com­put­ers. Any one of these might take a per­son years to ap­pre­ci­ate and un­der­stand. Sup­pose that they had all been pre­sented to mankind in a sin­gle year!

Mo­ravec (1988) ar­gued that AI was an ex­is­ten­tial risk, but nev­er­the­less, one to­ward which we must run (pp. 100-101):­tel­li­gent ma­chines… threaten our ex­is­tence… Machines merely as clever as hu­man be­ings will have enor­mous ad­van­tages in com­pet­i­tive situ­a­tions… So why rush head­long into an era of in­tel­li­gent ma­chines? The an­swer, I be­lieve, is that we have very lit­tle choice, if our cul­ture is to re­main vi­able… The uni­verse is one ran­dom event af­ter an­other. Sooner or later an un­stop­pable virus deadly to hu­mans will evolve, or a ma­jor as­ter­oid will col­lide with the earth, or the sun will ex­pand, or we will be in­vaded from the stars, or a black hole will swal­low the galaxy. The big­ger, more di­verse, and com­pe­tent a cul­ture is, the bet­ter it can de­tect and deal with ex­ter­nal dan­gers. The larger events hap­pen less fre­quently. By grow­ing rapidly enough, a cul­ture has a finite chance of sur­viv­ing for­ever.

Ray Kurzweil’s The Age of In­tel­li­gent Machines (1990) did not men­tion AI risk, and his fol­lowup, The Age of Spiritual Machines (1998) does so only briefly, in an “in­ter­view” be­tween the reader and Kurzweil. The reader asks, “So we risk the sur­vival of the hu­man race for [the op­por­tu­nity AI af­fords us to ex­pand our minds and ad­vance our abil­ity to cre­ate knowl­edge]?” Kurzweil an­swers: “Yeah, ba­si­cally.”

Min­sky (1984) pointed out the difficulty of get­ting ma­chines to do what we want: is always dan­ger­ous to try to re­lieve our­selves of the re­spon­si­bil­ity of un­der­stand­ing ex­actly how our wishes will be re­al­ized. When­ever we leave the choice of means to any ser­vants we may choose then the greater the range of pos­si­ble meth­ods we leave to those ser­vants, the more we ex­pose our­selves to ac­ci­dents and in­ci­dents. When we del­e­gate those re­spon­si­bil­ities, then we may not re­al­ize, be­fore it is too late to turn back, that our goals have been mis­in­ter­preted, per­haps even mal­i­ciously. We see this in such clas­sic tales of fate as Faust, the Sorcerer’s Ap­pren­tice, or the Mon­key’s Paw by W.W. Ja­cobs.

[Another] risk is ex­po­sure to the con­se­quences of self-de­cep­tion. It is always tempt­ing to say to one­self… that “I know what I would like to hap­pen, but I can’t quite ex­press it clearly enough.” How­ever, that con­cept it­self re­flects a too-sim­plis­tic self-image, which por­trays one’s own self as [hav­ing] well-defined wishes, in­ten­tions, and goals. This pre-Freudian image serves to ex­cuse our fre­quent ap­pear­ances of am­bivalence; we con­vince our­selves that clar­ify­ing our in­ten­tions is merely a mat­ter of straight­en­ing-out the in­put-out­put chan­nels be­tween our in­ner and outer selves. The trou­ble is, we sim­ply aren’t made that way. Our goals them­selves are am­bigu­ous.

The ul­ti­mate risk comes when [we] at­tempt to take that fi­nal step — of de­sign­ing goal-achiev­ing pro­grams that are pro­grammed to make them­selves grow in­creas­ingly pow­er­ful, by self-evolv­ing meth­ods that aug­ment and en­hance their own ca­pa­bil­ities. It will be tempt­ing to do this, both to gain power and to de­crease our own effort to­ward clar­ify­ing our own de­sires. If some ge­nie offered you three wishes, would not your first one be, “Tell me, please, what is it that I want the most!” The prob­lem is that, with such pow­er­ful ma­chines, it would re­quire but the slight­est ac­ci­dent of care­less de­sign for them to place their goals ahead of [ours]. The ma­chine’s goals may be allegedly benev­olent, as with the robots of With Folded Hands, by Jack Willi­am­son, whose ex­plicit pur­pose was allegedly benev­olent: to pro­tect us from harm­ing our­selves, or as with the robot in Colos­sus, by D.H.Jones, who it­self de­cides, at what­ever cost, to save us from an un­sus­pected en­emy. In the case of Arthur C. Clarke’s HAL, the ma­chine de­cides that the mis­sion we have as­signed to it is one we can­not prop­erly ap­pre­ci­ate. And in Ver­nor Vinge’s com­puter-game fan­tasy, True Names, the dreaded Mail­man… evolves new am­bi­tions of its own.

The Modern Era

Novelist Ver­nor Vinge (1993) pop­u­larized Good’s “in­tel­li­gence ex­plo­sion” con­cept, and wrote the first novel about self-im­prov­ing AI pos­ing an ex­is­ten­tial threat: A Fire Upon the Deep (1992). It was prob­a­bly Vinge who did more than any­one else to spur dis­cus­sions about AI risk, par­tic­u­larly in on­line com­mu­ni­ties like the ex­tropi­ans mailing list (since 1991) and SL4 (since 2000). Par­ti­ci­pants in these early dis­cus­sions in­cluded sev­eral of to­day’s lead­ing thinkers on AI risk: Robin Han­son, Eliezer Yud­kowsky, Nick Bostrom, An­ders Sand­berg, and Ben Go­ertzel. (Other posters in­cluded Peter Thiel, FM-2030, Robert Brad­bury, and Ju­lian As­sange.) Pro­pos­als like Friendly AI, Or­a­cle AI, and Nanny AI were dis­cussed here long be­fore they were brought to greater promi­nence with aca­demic pub­li­ca­tions (Yud­kowsky 2008; Arm­strong et al. 2012; Go­ertzel 2012).

Mean­while, philoso­phers and AI re­searchers con­sid­ered whether or not ma­chines could have moral value, and how to en­sure eth­i­cal be­hav­ior from less pow­er­ful ma­chines or ‘nar­row AIs’, a field of in­quiry var­i­ously known as ‘ar­tifi­cial moral­ity’ (Daniel­son 1992; Floridi & San­ders 2004; Allen et al. 2000), ‘ma­chine ethics’ (Hall 2000; McLaren 2005; An­der­son & An­der­son 2006), ‘com­pu­ta­tional ethics’ (Allen 2002) and ‘com­pu­ta­tional metaethics’ (Lokhorst 2011), and ‘robo-ethics’ or ‘robot ethics’ (Ca­purro et al. 2006; Sawyer 2007). This vein of re­search — what I’ll call the ‘ma­chine ethics’ liter­a­ture — was re­cently sum­ma­rized in two books: Wal­lach & Allen (2009); An­der­son & An­der­son (2011). Thus far, there has been a sig­nifi­cant com­mu­ni­ca­tion gap be­tween the ma­chine ethics liter­a­ture and the AI risk liter­a­ture (Allen and Wal­lach 2011), ex­cept­ing per­haps Muehlhauser and Helm (2012).

The topic of AI safety in the con­text of ex­is­ten­tial risk was left to the fu­tur­ists who had par­ti­ci­pated in on­line dis­cusses of AI risk and op­por­tu­nity. Here, I must cut short my re­view and fo­cus on just three (of many) im­por­tant figures: Eliezer Yud­kowksy, Robin Han­son, and Nick Bostrom. (Your au­thor also apol­o­gizes for the fact that, be­cause he works with Yud­kowsky, Yud­kowsky gets a more de­tailed treat­ment here than Han­son or Bostrom.)

Other figures in the mod­ern era of AI risk re­search in­clude Bill Hib­bard (Su­per-In­tel­li­gent Machines) and Ben Go­ertzel (“Should Hu­man­ity Build a Global AI Nanny to De­lay the Sin­gu­lar­ity Un­til It’s Bet­ter Un­der­stood”).

Eliezer Yudkowsky

Ac­cord­ing to “Eliezer, the per­son,” Eliezer Yud­kowsky (born 1979) was a bright kid — in the 99.9998th per­centile of cog­ni­tive abil­ity, ac­cord­ing to the Mid­west Ta­lent Search. He read lots of sci­ence fic­tion as a child, and at age 11 read Great Mambo Chicken and the Tran­shu­man Con­di­tion — his in­tro­duc­tion to the im­pend­ing re­al­ity of tran­shu­man­ist tech­nolo­gies like AI and nan­otech. The mo­ment he be­came a Sin­gu­lar­i­tar­ian was the mo­ment he read page 47 of True Names and Other Dangers by Ver­nor Vinge:

Here I had tried a straight­for­ward ex­trap­o­la­tion of tech­nol­ogy, and found my­self pre­cip­i­tated over an abyss. It’s a prob­lem we face ev­ery time we con­sider the cre­ation of in­tel­li­gences greater than our own. When this hap­pens, hu­man his­tory will have reached a kind of sin­gu­lar­ity—a place where ex­trap­o­la­tion breaks down and new mod­els must be ap­plied—and the world will pass be­yond our un­der­stand­ing.

Yud­kowsky re­ported his re­ac­tion:

My emo­tions at that mo­ment are hard to de­scribe; not fa­nat­i­cism, or en­thu­si­asm, just a vast feel­ing of “Yep. He’s right.” I knew, in the mo­ment I read that sen­tence, that this was how I would be spend­ing the rest of my life.

(As an aside, I’ll note that this is eerily similar to my own ex­pe­rience of en­coun­ter­ing the fa­mous I.J. Good para­graph about ul­train­tel­li­gence (quoted above), be­fore I knew what “tran­shu­man­ism” or “the Sin­gu­lar­ity” was. I read Good’s para­graph and thought, “Wow. That’s… prob­a­bly cor­rect. How could I have missed that im­pli­ca­tion? … … … Well, shit. That changes ev­ery­thing.”)

As a teenager in the mid 1990s, Yud­kowsky par­ti­ci­pated heav­ily in Sin­gu­lar­i­tar­ian dis­cus­sions on the ex­tropi­ans mailing list, and in 1996 (at age 17) he wrote “Star­ing into the Sin­gu­lar­ity,” which gained him much at­ten­tion, as did his pop­u­lar “FAQ about the Mean­ing of Life” (1999).

In 1998 Yud­kowsky was in­vited (along with 33 oth­ers) by economist Robin Han­son to com­ment on Vinge (1993). Thir­teen peo­ple (in­clud­ing Yud­kowsky) left com­ments, then Vinge re­sponded, and a fi­nal open dis­cus­sion was held on the ex­tropi­ans mailing list. Han­son ed­ited to­gether these re­sults here. Yud­kowsky thought Max More’s com­ments on Vinge un­der­es­ti­mated how differ­ent from hu­mans AI would prob­a­bly be, and this prompted Yud­kowsky to be­gin an early draft of “Cod­ing a Tran­shu­man AI” (CaTAI) which by 2000 had grown into the first large ex­pli­ca­tion of his thoughts on “Seed AI” and “friendly” ma­chine su­per­in­tel­li­gence (Yud­kowsky 2000).

Around this same time, Yud­kowsky wrote “The Plan to the Sin­gu­lar­ity” and “The Sin­gu­lar­i­tar­ian Prin­ci­ples,” and launched the SL4 mailing list.

At a May 2000 gath­er­ing hosted by the Fore­sight In­sti­tute, Brian Atk­ins and Sabine Stoeckel dis­cussed with Yud­kowsky the pos­si­bil­ity of launch­ing an or­ga­ni­za­tion spe­cial­iz­ing in AI safety. In July of that year, Yud­kowsky formed the Sin­gu­lar­ity In­sti­tute and be­gan his full-time re­search on the prob­lems of AI risk and op­por­tu­nity.

In 2001, he pub­lished two “se­quels” to CaTAI, “Gen­eral In­tel­li­gence and Seed AI” and, most im­por­tantly, “Creat­ing Friendly AI” (CFAI) (Yud­kowsky 2001).

The pub­li­ca­tion of CFAI was a sig­nifi­cant event, prompt­ing Ben Go­ertzel (the pi­o­neer of the new Ar­tifi­cial Gen­eral In­tel­li­gence re­search com­mu­nity) to say that “Creat­ing Friendly AI is the most in­tel­li­gent writ­ing about AI that I’ve read in many years,” and prompt­ing Eric Drexler (the pi­o­neer of molec­u­lar man­u­fac­tur­ing) to write that “With Creat­ing Friendly AI, the Sin­gu­lar­ity In­sti­tute has be­gun to fill in one of the great­est re­main­ing blank spots in the pic­ture of hu­man­ity’s fu­ture.”

CFAI was both frus­trat­ing and brilli­ant. It was frus­trat­ing be­cause: (1) it was di­s­or­ga­nized and opaque, (2) it in­vented new terms in­stead of us­ing the terms be­ing used by ev­ery­one else, for ex­am­ple speak­ing of “su­per­goals” and “sub­goals” in­stead of fi­nal and in­stru­men­tal goals, and speak­ing of goal sys­tems but never “util­ity func­tions,” and (3) it hardly cited any of the rele­vant works in AI, philos­o­phy, and psy­chol­ogy — for ex­am­ple it could have cited McCul­loch (1952), Good (1959, 1970, 1982), Cade (1966), Versenyi (1974), Evans (1979), Lamp­son (1979), the con­ver­sa­tion with Ed Fred­kin in McCor­duck (1979), Slo­man (1984), Sch­mid­hu­ber (1987), Wal­drop (1987), Pearl (1989), De Landa (1991), Cre­vier (1993, ch. 12), Clarke (1993, 1994), Weld & Etz­ioni (1994), Buss (1995), Rus­sell & Norvig (1995), Gips (1995), Whitby (1996), Sch­mid­hu­ber et al. (1997), Barto & Sut­ton (1998), Jack­son (1998), Le­vitt (1999), Mo­ravec (1999), Kurzweil (1999), So­bel (1999), Allen et al. (2000), Gor­don (2000), Harper (2000), Cole­man 2001, and Hut­ter (2001). Th­ese fea­tures still sub­stan­tially char­ac­ter­ize Yud­kowsky’s in­de­pen­dent writ­ing, e.g. see Yud­kowsky (2010). As late as Jan­uary 2006, he still wrote that “It is not that I have ne­glected to cite the ex­ist­ing ma­jor works on this topic, but that, to the best of my abil­ity to dis­cern, there are no ex­ist­ing ma­jor works to cite.”

On the other hand, CFAI was in many ways was brilli­ant, and it tack­led many of the prob­lems left mostly un­touched by main­stream ma­chine ethics re­searchers. For ex­am­ple, CFAI (but not the main­stream ma­chine ethics liter­a­ture) en­gaged the prob­lems of: (1) rad­i­cally self-im­prov­ing AI, (2) AI as an ex­is­ten­tial risk, (3) hard take­off, (4) the in­ter­play of goal con­tent, ac­qui­si­tion, and struc­ture, (5) wire­head­ing, (6) sub­goal stomp, (7) ex­ter­nal refer­ence se­man­tics, (8) causal val­idity se­man­tics, and (9) se­lec­tive sup­port (which Bostrom (2002) would later call “differ­en­tial tech­nolog­i­cal de­vel­op­ment”).

For many years, the Sin­gu­lar­ity In­sti­tute was lit­tle more than a ve­hi­cle for Yud­kowsky’s re­search. In 2002 he wrote “Levels of Or­ga­ni­za­tion in Gen­eral In­tel­li­gence,” which later ap­peared in the first ed­ited vol­ume on Ar­tifi­cial Gen­eral In­tel­li­gence (AGI). In 2003 he wrote what would be­come the in­ter­net’s most pop­u­lar tu­to­rial on Bayes’ The­o­rem, fol­lowed in 2005 by “A Tech­ni­cal Ex­pla­na­tion of Tech­ni­cal Ex­pla­na­tion.” In 2004 he ex­plained his vi­sion of a Friendly AI goal struc­ture: “Co­her­ent Ex­trap­o­lated Vo­li­tion.” In 2006 he wrote two chap­ters that would later ap­pear in the vol­ume Global Catas­troh­pic Risks vol­ume from Oxford Univer­sity Press (co-ed­ited by Bostrom): “Cog­ni­tive Bi­ases Po­ten­tially Affect­ing Judg­ment of Global Risks” and, what re­mains his “clas­sic” ar­ti­cle on the need for Friendly AI, “Ar­tifi­cial In­tel­li­gence as a Pos­i­tive and Nega­tive Fac­tor in Global Risk.

In 2004, Tyler Emer­son was hired as the Sin­gu­lar­ity In­sti­tute’s ex­ec­u­tive di­rec­tor. Emer­son brought on Nick Bostrom (then a post doc­toral fel­low at Yale), Chris­tine Peter­son (of the Fore­sight In­sti­tute), and oth­ers, as ad­vi­sors. In Fe­bru­ary 2006, Pay­pal co-founder Peter Thiel donated $100,000 to the Sin­gu­lar­ity In­sti­tute, and, we might say, the Sin­gu­lar­ity In­sti­tute as we know it to­day was born.

From 2005-2007, Yud­kowsky worked at var­i­ous times with Mar­cello Her­reshoff, Nick Hay and Peter de Blanc on the tech­ni­cal prob­lems of AGI nec­es­sary for tech­ni­cal FAI work, for ex­am­ple cre­at­ing AIXI-like ar­chi­tec­tures, de­vel­op­ing a re­flec­tive de­ci­sion the­ory, and in­ves­ti­gat­ing limits in­her­ent in self-re­flec­tion due to Löb’s The­o­rem. Al­most none of this re­search has been pub­lished, in part be­cause of the de­sire not to ac­cel­er­ate AGI re­search with­out hav­ing made cor­re­spond­ing safety progress. (Mar­cello also worked with Eliezer dur­ing the sum­mer of 2009.)

Much of the Sin­gu­lar­ity In­sti­tute’s work has been “move­ment-build­ing” work. The in­sti­tute’s Sin­gu­lar­ity Sum­mit, held an­nu­ally since 2006, at­tracts tech­nol­o­gists, fu­tur­ists, and so­cial en­trepreneurs from around the world, bring­ing to their at­ten­tion not only emerg­ing and fu­ture tech­nolo­gies but also the ba­sics of AI risk and op­por­tu­nity. The Sin­gu­lar­ity Sum­mit also gave the Sin­gu­lar­ity In­sti­tute much of its ac­cess to cul­tural, aca­demic, and busi­ness elites.

Another key piece of move­ment-build­ing work was Yud­kowsky’s “The Se­quences,” which were writ­ten dur­ing 2006-2009. Yud­kowsky blogged, al­most daily, on the sub­jects of episte­mol­ogy, lan­guage, cog­ni­tive bi­ases, de­ci­sion-mak­ing, quan­tum me­chan­ics, metaethics, and ar­tifi­cial in­tel­li­gence. Th­ese posts were origi­nally pub­lished on a com­mu­nity blog about ra­tio­nal­ity, Over­com­ing Bias (which later be­came Han­son’s per­sonal blog). Later, Yud­kowsky’s posts were used as the seed ma­te­rial for a new group blog, Less Wrong.

Yud­kowsky’s goal was to cre­ate a com­mu­nity of peo­ple who could avoid com­mon think­ing mis­takes, change their minds in re­sponse to ev­i­dence, and gen­er­ally think and act with an un­usual de­gree of Tech­ni­cal Ra­tion­al­ity. In CFAI he had pointed out that when it comes to AI, hu­man­ity may not have a sec­ond chance to get it right. So we can’t run a se­ries of in­tel­li­gence ex­plo­sion ex­per­i­ments and “see what works.” In­stead, we need to pre­dict in ad­vance what we need to do to en­sure a de­sir­able fu­ture, and we need to over­come com­mon think­ing er­rors when do­ing so. (Later, Yud­kowsky ex­panded his “com­mu­nity of ra­tio­nal­ists” by writ­ing the most pop­u­lar Harry Pot­ter fan­fic­tion in the world, Harry Pot­ter and the Meth­ods of Ra­tion­al­ity, and is cur­rently helping to launch a new or­ga­ni­za­tion that will teach classes on the skills of ra­tio­nal thought and ac­tion.)

This com­mu­nity demon­strated its use­ful­ness in 2009 when Yud­kowsky be­gan blog­ging about some prob­lems in de­ci­sion the­ory re­lated to the pro­ject of build­ing a Friendly AI. Much like Tim Gow­ers’ Poly­math Pro­ject, these dis­cus­sions demon­strated the power of col­lab­o­ra­tive prob­lem-solv­ing over the in­ter­net. The dis­cus­sions led to a de­ci­sion the­ory work­shop and then a de­ci­sion the­ory mailing list, which quickly be­came home to some of the most in­ter­est­ing work in de­ci­sion the­ory any­where in the world. Yud­kowsky sum­ma­rized some of his ear­lier re­sults in “Time­less De­ci­sion The­ory” (2010), and newer re­sults have been posted to Less Wrong, for ex­am­ple A model of UDT with a halt­ing or­a­cle and For­mu­las of ar­ith­metic that be­have like de­ci­sion agents.

The Sin­gu­lar­ity In­sti­tute also built its com­mu­nity with a Visit­ing Fel­lows pro­gram that hosted groups of re­searchers for 1-3 months at a time. To­gether, both vis­it­ing fel­lows and newly hired re­search fel­lows pro­duced sev­eral work­ing pa­pers be­tween 2009 and 2011, in­clud­ing Ma­chine Ethics and Su­per­in­tel­li­gence, Im­pli­ca­tions of a Soft­ware-Limited Sin­gu­lar­ity, Eco­nomic Im­pli­ca­tions of Soft­ware Minds, Con­ver­gence of Ex­pected Utility for Univer­sal AI, and On­tolog­i­cal Crises in Ar­tifi­cial Agents’ Value Sys­tems.

In 2011, then-pres­i­dent Michael Vas­sar left the Sin­gu­lar­ity In­sti­tute to help launch a per­son­al­ized medicine com­pany, and re­search fel­low Luke Muehlhauser (the au­thor of this doc­u­ment) took over lead­er­ship from Vas­sar, as Ex­ec­u­tive Direc­tor. Dur­ing this time, the In­sti­tute un­der­went a ma­jor over­haul to im­ple­ment best prac­tices for or­ga­ni­za­tional pro­cess and man­age­ment: it pub­lished its first strate­gic plan, be­gan to main­tain its first donor database, adopted best prac­tices for ac­count­ing and book­keep­ing, up­dated its by­laws and ar­ti­cles of in­cor­po­ra­tion, adopted more stan­dard roles for the Board of Direc­tors and the Ex­ec­u­tive Direc­tor, held a se­ries of strate­gic meet­ings to help de­cide the near-term goals of the or­ga­ni­za­tion, be­gan to pub­lish monthly progress re­ports to its blog, started out­sourc­ing more work, and be­gan to work on more ar­ti­cles for peer-re­viewed pub­li­ca­tions: as of March 2012, the Sin­gu­lar­ity In­sti­tute has more peer-re­viewed pub­li­ca­tions forth­com­ing in 2012 than it had pub­lished in all of 2001-2011 com­bined.

To­day, the Sin­gu­lar­ity In­sti­tute col­lab­o­rates reg­u­larly with its (non-staff) re­search as­so­ci­ates, and also with re­searchers at the Fu­ture of Hu­man­ity In­sti­tute at Oxford Univer­sity (di­rected by Bostrom), which as of March 2012 is the world’s only other ma­jor re­search in­sti­tute largely fo­cused on the prob­lems of ex­is­ten­tial risk.

Robin Hanson

Whereas Yud­kowsky has never worked in the for-profit world and had no for­mal ed­u­ca­tion af­ter high school, Robin Han­son (born 1959) has a long and pres­ti­gious aca­demic and pro­fes­sional his­tory. Han­son took a B.S. in physics from U.C. Irv­ine in 1981, took an M.S. in physics and an M.A. in the con­cep­tual foun­da­tions of sci­ence from U. Chicago in 1984, worked in ar­tifi­cial in­tel­li­gence for Lock­heed and NASA, got a Ph.D. in so­cial sci­ence from Caltech in 1997, did a post-doc­toral fel­low­ship at U.C. Berkeley in Health policy from 1997-1999, and fi­nally was made an as­sis­tant pro­fes­sor of eco­nomics at Ge­orge Ma­son Univer­sity in 1999. In eco­nomics, he is best known for con­ceiv­ing of pre­dic­tion mar­kets.

When Han­son moved to Cal­ifor­nia in 1984, he en­coun­tered the Pro­ject Xanadu crowd and met Eric Drexler, who showed him an early draft of Eng­ines of Creation. This com­mu­nity dis­cussed AI, nan­otech, cry­on­ics, and other tran­shu­man­ist top­ics, and Han­son joined the ex­tropi­ans mailing list (along with many oth­ers from Pro­ject Xanadu) when it launched in 1991.

Han­son has pub­lished sev­eral pa­pers on the eco­nomics of whole brain em­u­la­tions (what he calls “ems”) and AI (1994, 1998a, 1998b, 2008a, 2008b, 2008c, 2012a). His writ­ings at Over­com­ing Bias (launched Novem­ber 2006) are per­haps even more in­fluen­tial, and cover a wide range of top­ics.

Han­son’s views on AI risk and op­por­tu­nity differ from Yud­kowsky’s. First, Han­son sees the tech­nolog­i­cal sin­gu­lar­ity and the hu­man-ma­chine con­flict it may pro­duce not as a unique event caused by the ad­vent of AI, but as a nat­u­ral con­se­quence of “the gen­eral fact that ac­cel­er­at­ing rates of change in­crease in­ter­gen­er­a­tional con­flicts” (Han­son 2012b). Se­cond, Han­son thinks an in­tel­li­gence ex­plo­sion will be slower and more grad­ual than Yud­kowsky does, deny­ing Yud­kowsky’s “hard take­off” the­sis (Han­son & Yud­kowsky 2008).

Nick Bostrom

Nick Bostrom (born 1973) re­ceived a B.S. in philos­o­phy, math­e­mat­ics, math­e­mat­i­cal logic, and ar­tifi­cial in­tel­li­gence from the Univer­sity of Gote­borg in 1994, set­ting a na­tional record in Swe­den for un­der­grad­u­ate aca­demic perfor­mance. He re­ceived an M.A. in philos­o­phy and physics from from U. Stock­holm in 1996, did work in as­tro­physics and com­pu­ta­tional neu­ro­science at King’s Col­lege Lon­don, and re­ceived his Ph.D. from the Lon­don School of Eco­nomics in 2000. He went on to be a post-doc­toral fel­low at Yale Univer­sity and in 2005 be­came the found­ing di­rec­tor of Oxford Univer­sity’s Fu­ture of Hu­man­ity In­sti­tute (FHI). Without leav­ing FHI, he be­came the found­ing di­rec­tor of Oxford’s Pro­gramme on the Im­pacts of Fu­ture Tech­nol­ogy (aka Fu­tureTech) in 2011.

Bostrom had long been in­ter­ested in cog­ni­tive en­hance­ment, and in 1995 he joined the ex­tropi­ans mailing list and learned about cry­on­ics, up­load­ing, AI, and other top­ics.

Bostrom worked with Bri­tish philoso­pher David Pearce) to found the World Tran­shu­man­ist As­so­ci­a­tion (now called H+) in 1998, with the pur­pose of de­vel­op­ing a more ma­ture and aca­dem­i­cally re­spectable form of tran­shu­man­ism than was usu­ally pre­sent on the ex­tropi­ans mailing list. Dur­ing this time Bostrom wrote “The Tran­shu­man­ist FAQ” (now up­dated to ver­sion 2.1), with in­put from more than 50 oth­ers.

His first philo­soph­i­cal pub­li­ca­tion was “Pre­dic­tions from Philos­o­phy? How philoso­phers could make them­selves use­ful” (1997). In this pa­per, Bostrom pro­posed “a new type of philos­o­phy, a philos­o­phy whose aim is pre­dic­tion.” On Bostrom’s view, one role for the philoso­pher is to be a poly­math who can en­gage in tech­nolog­i­cal pre­dic­tion and try to figure out how to steer the fu­ture so that hu­man­ity’s goals are best met.

Bostrom gave three ex­am­ples of prob­lems this new breed of philoso­pher-poly­math could tackle: the Dooms­day ar­gu­ment and an­throp­ics, the Fermi para­dox, and su­per­in­tel­li­gence:

What ques­tions could a philos­o­phy of su­per­in­tel­li­gence deal with? Well, ques­tions like: How much would the pre­dic­tive power for var­i­ous fields in­crease if we in­crease the pro­cess­ing speed of a hu­man-like mind a mil­lion times? If we ex­tend the short-term or long-term mem­ory? If we in­crease the neu­ral pop­u­la­tion and the con­nec­tion den­sity? What other ca­pac­i­ties would a su­per­in­tel­li­gence have? How easy would it be for it to re­dis­cover the great­est hu­man in­ven­tions, and how much in­put would it need to do so? What is the rel­a­tive im­por­tance of data, the­ory, and in­tel­lec­tual ca­pac­ity in var­i­ous dis­ci­plines? Can we know any­thing about the mo­ti­va­tion of a su­per­in­tel­li­gence? Would it be fea­si­ble to pre­pro­gram it to be good or philan­thropic, or would such rules be hard to rec­on­cile with the flex­i­bil­ity of its cog­ni­tive pro­cesses? Would a su­per­in­tel­li­gence, given the de­sire to do so, be able to out­wit hu­mans into pro­mot­ing its own aims even if we had origi­nally taken strict pre­cau­tions to avoid be­ing ma­nipu­lated? Could one use one su­per­in­tel­li­gence to con­trol an­other? How would su­per­in­tel­li­gences com­mu­ni­cate with each other? Would they have thoughts which were of a to­tally differ­ent kind from the thoughts that hu­mans can think? Would they be in­ter­ested in art and re­li­gion? Would all su­per­in­tel­li­gences ar­rive at more or less the same con­clu­sions re­gard­ing all im­por­tant sci­en­tific and philo­soph­i­cal ques­tions, or would they dis­agree as much as hu­mans do? And how similar in their in­ter­nal be­lief-struc­tures would they be? How would our hu­man self-per­cep­tion and as­pira­tions change if were forced to ab­di­cate the throne of wis­dom...? How would we in­di­vi­d­u­ate be­tween su­per­minds if they could com­mu­ni­cate and fuse and sub­di­vide with enor­mous speed? Will a no­tion of per­sonal iden­tity still ap­ply to such in­ter­con­nected minds? Would they con­struct an ar­tifi­cial re­al­ity in which to live? Could we up­load our­selves into that re­al­ity? Could we then be able to com­pete with the su­per­in­tel­li­gences, if we were ac­cel­er­ated and aug­mented with ex­tra mem­ory etc., or would such profound re­or­gani­sa­tion be nec­es­sary that we would no longer feel we were hu­mans? Would that mat­ter?

Bostrom went on to ex­am­ine some philo­soph­i­cal is­sues re­lated to su­per­in­tel­li­gence, in “Pre­dic­tions from Philos­o­phy” and in “How Long Be­fore Su­per­in­tel­li­gence?” (1998), “Ex­is­ten­tial Risks: An­a­lyz­ing Hu­man Ex­tinc­tion Sce­nar­ios and Re­lated Hazards” (2002), “Eth­i­cal Is­sues in Ad­vanced Ar­tifi­cial In­tel­li­gence” (2003), “The Fu­ture of Hu­man Evolu­tion” (2004), and “The Ethics of Ar­tifi­cial In­tel­li­gence” (2012, coau­thored with Yud­kowsky). (He also played out the role of philoso­pher-poly­math with re­gard to sev­eral other top­ics, in­clud­ing hu­man en­hance­ment and an­thropic bias.)

Bostrom’s in­dus­tri­ous­ness paid off:

In 2009, [Bostrom] was awarded the Eu­gene R. Gan­non Award (one per­son se­lected an­nu­ally wor­ld­wide from the fields of philos­o­phy, math­e­mat­ics, the arts and other hu­man­i­ties, and the nat­u­ral sci­ences). He has been listed in the FP 100 Global Thinkers list, the For­eign Policy Magaz­ineʹs list of the wor­ldʹs top 100 minds. His writ­ings have been trans­lated into more than 21 lan­guages, and there have been some 80 trans­la­tions or reprints of his works. He has done more than 470 in­ter­views for TV, film, ra­dio, and print me­dia, and he has ad­dressed aca­demic and pop­u­lar au­di­ences around the world.

The other long-term mem­ber of the Fu­ture of Hu­man­ity In­sti­tute, An­ders Sand­berg, has also pub­lished some re­search on AI risk. Sand­berg was a co-au­thor on the whole brain em­u­la­tion roadmap and “An­thropic Shadow”, and also wrote “Models of the Tech­nolog­i­cal Sin­gu­lar­ity” and sev­eral other pa­pers.

Re­cently, Bostrom and Sand­berg were joined by Stu­art Arm­strong, who wrote “An­thropic De­ci­sion The­ory” (2011) and was the lead au­thor on “Think­ing In­side the Box: Us­ing and Con­trol­ling Or­a­cle AI” (2012). He had pre­vi­ously writ­ten Chain­ing God (2007).

For more than a year, Bostrom has been work­ing on a new book ti­tled Su­per­in­tel­li­gence: A Strate­gic Anal­y­sis of the Com­ing Ma­chine In­tel­li­gence Revolu­tion, which aims to sum up and or­ga­nize much of the (pub­lished and un­pub­lished) work done in the past decade by re­searchers at the Sin­gu­lar­ity In­sti­tute and FHI on the sub­ject of AI risk and op­por­tu­nity, as well as con­tribute new in­sights.

AI Risk Goes Mainstream

In 1997, pro­fes­sor of cy­ber­net­ics Kevin War­wick pub­lished March of the Machines, in which he pre­dicted that within a cou­ple decades, ma­chines would be­come more in­tel­li­gent than hu­mans, and would pose an ex­is­ten­tial threat.

In 2000, Sun Microsys­tems co-founder Bill Joy pub­lished “Why the Fu­ture Doesn’t Need Us” in Wired mag­a­z­ine. In this widely-cir­cu­lated es­say, Joy ar­gued that “Our most pow­er­ful 21st-cen­tury tech­nolo­gies — robotics, ge­netic en­g­ineer­ing, and nan­otech — are threat­en­ing to make hu­mans an en­dan­gered species.” Joy ad­vised that we re­lin­quish de­vel­op­ment of these tech­nolo­gies rather than sprint­ing head­long into an arms race be­tween de­struc­tive uses of these tech­nolo­gies and defenses against those de­struc­tive uses.

Many peo­ple dis­missed Bill Joy as a “Neo-Lud­dite,” but many ex­perts ex­pressed similar con­cerns about hu­man ex­tinc­tion, in­clud­ing philoso­pher John Les­lie (The End of the World), physi­cist Martin Rees (Our Fi­nal Hour), le­gal the­o­rist Richard Pos­ner (Catas­tro­phe: Risk and Re­sponse), and the con­trib­u­tors to Global Catas­trophic Risks (in­clud­ing Yud­kowsky, Han­son, and Bostrom).

Even Ray Kurzweil, known as an op­ti­mist about tech­nol­ogy, de­voted a chap­ter of his 2005 best­sel­ler The Sin­gu­lar­ity is Near to a dis­cus­sion of ex­is­ten­tial risks, in­clud­ing risks from AI. Though dis­cussing the pos­si­bil­ity of ex­is­ten­tial catas­tro­phe at length, his take on AI risk was cur­sory (p. 420):

In­her­ently there will be no ab­solute pro­tec­tion against strong AI. Although the ar­gu­ment is sub­tle I be­lieve that main­tain­ing an open free-mar­ket sys­tem for in­cre­men­tal sci­en­tific and tech­nolog­i­cal progress, in which each step is sub­ject to mar­ket ac­cep­tance, will provide the most con­struc­tive en­vi­ron­ment for tech­nol­ogy to em­body wide­spread hu­man val­ues. As I have pointed out, strong AI is emerg­ing from many di­verse efforts and will be deeply in­te­grated into our civ­i­liza­tion’s in­fras­truc­ture. In­deed, it will be in­ti­mately em­bed­ded in our bod­ies and brains. As such, it will re­flect our val­ues be­cause it will be us.

AI risk fi­nally be­came a “main­stream” topic in an­a­lytic philos­o­phy with Chalmers (2010) and an en­tire is­sue of Jour­nal of Con­scious­ness Stud­ies de­voted to the topic.

The ear­liest pop­u­lar dis­cus­sion of ma­chine su­per­in­tel­li­gence may have been in Christo­pher Evans’ in­ter­na­tional best­sel­ler The Mighty Micro (1979), pages 194-198, 231-233, and 237-246.

The Cur­rent Situation

Two decades have passed since the early tran­shu­man­ists be­gan to se­ri­ously dis­cuss AI risk and op­por­tu­nity on the ex­tropi­ans mailing list. (Be­fore that, some dis­cus­sions took place at the MIT AI lab, but that was be­fore the web was pop­u­lar, so they weren’t recorded.) What have we hu­mans done since then?

Lots of talk­ing. Hun­dreds of thou­sands of man-hours have been in­vested into dis­cus­sions on the ex­tropi­ans mailing list, SL4, Over­com­ing Bias, Less Wrong, the Sin­gu­lar­ity In­sti­tute’s de­ci­sion the­ory mailing list, sev­eral other in­ter­net fo­rums, and also in meat-space (es­pe­cially in the Bay Area near the Sin­gu­lar­ity In­sti­tute and in Oxford near FHI). Th­ese are difficult is­sues; talk­ing them through is usu­ally the first step to get­ting any­thing else done.

Or­ga­ni­za­tion. Mailing lists are a form of or­ga­ni­za­tion, as are or­ga­ni­za­tions like The Sin­gu­lar­ity In­sti­tute and uni­ver­sity de­part­ments like the FHI and Fu­tureTech. Estab­lished or­ga­ni­za­tions provide op­por­tu­ni­ties to bring peo­ple to­gether, and to pool and di­rect re­sources effi­ciently.

Re­sources. Many peo­ple of con­sid­er­able wealth, along with thou­sands of oth­ers of “con­cerned cit­i­zens” around the world, have de­cided that AI is the most sig­nifi­cant risk and op­por­tu­nity we face, and are will­ing to in­vest in hu­man­ity’s fu­ture.

Outreach. Publi­ca­tions (both aca­demic and pop­u­lar), talks, and in­ter­ac­tions with ma­jor and minor me­dia out­lets have been used to raise aware­ness of AI risk and op­por­tu­nity. This has in­cluded out­reach to spe­cific AGI re­searchers, some of whom now take AI safety quite se­ri­ously. This also in­cludes out­reach to peo­ple in po­si­tions of in­fluence who are in a po­si­tion to en­gage in differ­en­tial tech­nolog­i­cal de­vel­op­ment. It also in­cludes out­reach to the rapidly grow­ing “op­ti­mal philan­thropy” com­mu­nity; a large frac­tion of those as­so­ci­ated with Giv­ing What We Can take ex­is­ten­tial risk — and AI risk in par­tic­u­lar — quite se­ri­ously.

Re­search. So far, most re­search on the topic has been con­cerned with try­ing to be­come less con­fused about what, ex­actly, the prob­lem is, how wor­ried we should be, and which strate­gic ac­tions we should take. How do we pre­dict tech­nolog­i­cal progress? How can we pre­dict AI out­comes? Which in­ter­ven­tions, taken now, would prob­a­bly in­crease the odds of pos­i­tive AI out­comes? There has also been some “tech­ni­cal” re­search in de­ci­sion the­ory (e.g. TDT, UDT, ADT), the math of AI goal sys­tems (“Learn­ing What to Value”,” “On­tolog­i­cal Crises in Ar­tifi­cial Agents’ Value Sys­tems,” “Con­ver­gence of Ex­pected Utility for Univer­sal AI”), and Yud­kowsky’s un­pub­lished re­search on Friendly AI.

Muehlhauser 2011 pro­vides an overview of the cat­e­gories of re­search prob­lems we have left to solve. Most of the known prob­lems aren’t even well-defined at this point.