There’s No Fire Alarm for Artificial General Intelligence

What is the func­tion of a fire alarm?

One might think that the func­tion of a fire alarm is to provide you with im­por­tant ev­i­dence about a fire ex­ist­ing, al­low­ing you to change your policy ac­cord­ingly and exit the build­ing.

In the clas­sic ex­per­i­ment by Latane and Dar­ley in 1968, eight groups of three stu­dents each were asked to fill out a ques­tion­naire in a room that shortly af­ter be­gan filling up with smoke. Five out of the eight groups didn’t re­act or re­port the smoke, even as it be­came dense enough to make them start cough­ing. Sub­se­quent ma­nipu­la­tions showed that a lone stu­dent will re­spond 75% of the time; while a stu­dent ac­com­panied by two ac­tors told to feign ap­a­thy will re­spond only 10% of the time. This and other ex­per­i­ments seemed to pin down that what’s hap­pen­ing is plu­ral­is­tic ig­no­rance. We don’t want to look pan­icky by be­ing afraid of what isn’t an emer­gency, so we try to look calm while glanc­ing out of the cor­ners of our eyes to see how oth­ers are re­act­ing, but of course they are also try­ing to look calm.

(I’ve read a num­ber of repli­ca­tions and vari­a­tions on this re­search, and the effect size is blatant. I would not ex­pect this to be one of the re­sults that dies to the repli­ca­tion crisis, and I haven’t yet heard about the repli­ca­tion crisis touch­ing it. But we have to put a maybe-not marker on ev­ery­thing now.)

A fire alarm cre­ates com­mon knowl­edge, in the you-know-I-know sense, that there is a fire; af­ter which it is so­cially safe to re­act. When the fire alarm goes off, you know that ev­ery­one else knows there is a fire, you know you won’t lose face if you pro­ceed to exit the build­ing.

The fire alarm doesn’t tell us with cer­tainty that a fire is there. In fact, I can’t re­call one time in my life when, ex­it­ing a build­ing on a fire alarm, there was an ac­tual fire. Really, a fire alarm is weaker ev­i­dence of fire than smoke com­ing from un­der a door.

But the fire alarm tells us that it’s so­cially okay to re­act to the fire. It promises us with cer­tainty that we won’t be em­bar­rassed if we now pro­ceed to exit in an or­derly fash­ion.

It seems to me that this is one of the cases where peo­ple have mis­taken be­liefs about what they be­lieve, like when some­body loudly en­dors­ing their city’s team to win the big game will back down as soon as asked to bet. They haven’t con­sciously dis­t­in­guished the re­ward­ing ex­hil­a­ra­tion of shout­ing that the team will win, from the feel­ing of an­ti­ci­pat­ing the team will win.

When peo­ple look at the smoke com­ing from un­der the door, I think they think their un­cer­tain wob­bling feel­ing comes from not as­sign­ing the fire a high-enough prob­a­bil­ity of re­ally be­ing there, and that they’re re­luc­tant to act for fear of wast­ing effort and time. If so, I think they’re in­ter­pret­ing their own feel­ings mis­tak­enly. If that was so, they’d get the same wob­bly feel­ing on hear­ing the fire alarm, or even more so, be­cause fire alarms cor­re­late to fire less than does smoke com­ing from un­der a door. The un­cer­tain wob­bling feel­ing comes from the worry that oth­ers be­lieve differ­ently, not the worry that the fire isn’t there. The re­luc­tance to act is the re­luc­tance to be seen look­ing fool­ish, not the re­luc­tance to waste effort. That’s why the stu­dent alone in the room does some­thing about the fire 75% of the time, and why peo­ple have no trou­ble re­act­ing to the much weaker ev­i­dence pre­sented by fire alarms.

* * *

It’s now and then pro­posed that we ought to start re­act­ing later to the is­sues of Ar­tifi­cial Gen­eral In­tel­li­gence (back­ground here), be­cause, it is said, we are so far away from it that it just isn’t pos­si­ble to do pro­duc­tive work on it to­day.

(For di­rect ar­gu­ment about there be­ing things doable to­day, see: Soares and Fallen­stein (2014/​2017); Amodei, Olah, Stein­hardt, Chris­ti­ano, Schul­man, and Mané (2016); or Tay­lor, Yud­kowsky, LaVic­toire, and Critch (2016).)

(If none of those pa­pers ex­isted or if you were an AI re­searcher who’d read them but thought they were all garbage, and you wished you could work on al­ign­ment but knew of noth­ing you could do, the wise next step would be to sit down and spend two hours by the clock sincerely try­ing to think of pos­si­ble ap­proaches. Prefer­ably with­out self-sab­o­tage that makes sure you don’t come up with any­thing plau­si­ble; as might hap­pen if, hy­po­thet­i­cally speak­ing, you would ac­tu­ally find it much more com­fortable to be­lieve there was noth­ing you ought to be work­ing on to­day, be­cause e.g. then you could work on other things that in­ter­ested you more.)

(But never mind.)

So if AGI seems far-ish away, and you think the con­clu­sion li­censed by this is that you can’t do any pro­duc­tive work on AGI al­ign­ment yet, then the im­plicit al­ter­na­tive strat­egy on offer is: Wait for some un­speci­fied fu­ture event that tells us AGI is com­ing near; and then we’ll all know that it’s okay to start work­ing on AGI al­ign­ment.

This seems to me to be wrong on a num­ber of grounds. Here are some of them.

One: As Stu­art Rus­sell ob­served, if you get ra­dio sig­nals from space and spot a space­ship there with your telescopes and you know the aliens are land­ing in thirty years, you still start think­ing about that to­day.

You’re not like, “Meh, that’s thirty years off, what­ever.” You cer­tainly don’t ca­su­ally say “Well, there’s noth­ing we can do un­til they’re closer.” Not with­out spend­ing two hours, or at least five min­utes by the clock, brain­storm­ing about whether there is any­thing you ought to be start­ing now.

If you said the aliens were com­ing in thirty years and you were there­fore go­ing to do noth­ing to­day… well, if these were more effec­tive times, some­body would ask for a sched­ule of what you thought ought to be done, start­ing when, how long be­fore the aliens ar­rive. If you didn’t have that sched­ule ready, they’d know that you weren’t op­er­at­ing ac­cord­ing to a worked table of timed re­sponses, but just pro­cras­ti­nat­ing and do­ing noth­ing; and they’d cor­rectly in­fer that you prob­a­bly hadn’t searched very hard for things that could be done to­day.

In Bryan Ca­plan’s terms, any­one who seems quite ca­sual about the fact that “noth­ing can be done now to pre­pare” about the aliens is miss­ing a mood; they should be much more alarmed at not be­ing able to think of any way to pre­pare. And maybe ask if some­body else has come up with any ideas? But never mind.

Two: His­tory shows that for the gen­eral pub­lic, and even for sci­en­tists not in a key in­ner cir­cle, and even for sci­en­tists in that key cir­cle, it is very of­ten the case that key tech­nolog­i­cal de­vel­op­ments still seem decades away, five years be­fore they show up.

In 1901, two years be­fore helping build the first heav­ier-than-air flyer, Wilbur Wright told his brother that pow­ered flight was fifty years away.

In 1939, three years be­fore he per­son­ally over­saw the first crit­i­cal chain re­ac­tion in a pile of ura­nium bricks, En­rico Fermi voiced 90% con­fi­dence that it was im­pos­si­ble to use ura­nium to sus­tain a fis­sion chain re­ac­tion. I be­lieve Fermi also said a year af­ter that, aka two years be­fore the de­noue­ment, that if net power from fis­sion was even pos­si­ble (as he then granted some greater plau­si­bil­ity) then it would be fifty years off; but for this I ne­glected to keep the cita­tion.

And of course if you’re not the Wright Brothers or En­rico Fermi, you will be even more sur­prised. Most of the world learned that atomic weapons were now a thing when they woke up to the head­lines about Hiroshima. There were es­teemed in­tel­lec­tu­als say­ing four years af­ter the Wright Flyer that heav­ier-than-air flight was im­pos­si­ble, be­cause knowl­edge prop­a­gated more slowly back then.

Were there events that, in hind­sight, to­day, we can see as signs that heav­ier-than-air flight or nu­clear en­ergy were near­ing? Sure, but if you go back and read the ac­tual news­pa­pers from that time and see what peo­ple ac­tu­ally said about it then, you’ll see that they did not know that these were signs, or that they were very un­cer­tain that these might be signs. Some play­ing the part of Ex­cited Fu­tur­ists pro­claimed that big changes were im­mi­nent, I ex­pect, and oth­ers play­ing the part of Sober Scien­tists tried to pour cold wa­ter on all that childish en­thu­si­asm; I ex­pect that part was more or less ex­actly the same decades ear­lier. If some­where in that din was a su­perfore­caster who said “decades” when it was decades and “5 years” when it was five, good luck notic­ing them amid all the noise. More likely, the su­perfore­cast­ers were the ones who said “Could be to­mor­row, could be decades” both when the big de­vel­op­ment was a day away and when it was decades away.

One of the ma­jor modes by which hind­sight bias makes us feel that the past was more pre­dictable than any­one was ac­tu­ally able to pre­dict at the time, is that in hind­sight we know what we ought to no­tice, and we fix­ate on only one thought as to what each piece of ev­i­dence in­di­cates. If you look at what peo­ple ac­tu­ally say at the time, his­tor­i­cally, they’ve usu­ally got no clue what’s about to hap­pen three months be­fore it hap­pens, be­cause they don’t know which signs are which.

I mean, you could say the words “AGI is 50 years away” and have those words hap­pen to be true. Peo­ple were also say­ing that pow­ered flight was decades away when it was in fact decades away, and those peo­ple hap­pened to be right. The prob­lem is that ev­ery­thing looks the same to you ei­ther way, if you are ac­tu­ally liv­ing his­tory in­stead of read­ing about it af­ter­wards.

It’s not that when­ever some­body says “fifty years” the thing always hap­pens in two years. It’s that this con­fi­dent pre­dic­tion of things be­ing far away cor­re­sponds to an epistemic state about the tech­nol­ogy that feels the same way in­ter­nally un­til you are very very close to the big de­vel­op­ment. It’s the epistemic state of “Well, I don’t see how to do the thing” and some­times you say that fifty years off from the big de­vel­op­ment, and some­times you say it two years away, and some­times you say it while the Wright Flyer is fly­ing some­where out of your sight.

Three: Progress is driven by peak knowl­edge, not av­er­age knowl­edge.

If Fermi and the Wrights couldn’t see it com­ing three years out, imag­ine how hard it must be for any­one else to see it.

If you’re not at the global peak of knowl­edge of how to do the thing, and looped in on all the progress be­ing made at what will turn out to be the lead­ing pro­ject, you aren’t go­ing to be able to see of your own knowl­edge at all that the big de­vel­op­ment is im­mi­nent. Un­less you are very good at per­spec­tive-tak­ing in a way that wasn’t nec­es­sary in a hunter-gath­erer tribe, and very good at re­al­iz­ing that other peo­ple may know tech­niques and ideas of which you have no inkling even that you do not know them. If you don’t con­sciously com­pen­sate for the les­sons of his­tory in this re­gard; then you will promptly say the decades-off thing. Fermi wasn’t still think­ing that net nu­clear en­ergy was im­pos­si­ble or decades away by the time he got to 3 months be­fore he built the first pile, be­cause at that point Fermi was looped in on ev­ery­thing and saw how to do it. But any­one not looped in prob­a­bly still felt like it was fifty years away while the ac­tual pile was fizzing away in a squash court at the Univer­sity of Chicago.

Peo­ple don’t seem to au­to­mat­i­cally com­pen­sate for the fact that the timing of the big de­vel­op­ment is a func­tion of the peak knowl­edge in the field, a thresh­old touched by the peo­ple who know the most and have the best ideas; while they them­selves have av­er­age knowl­edge; and there­fore what they them­selves know is not strong ev­i­dence about when the big de­vel­op­ment hap­pens. I think they aren’t think­ing about that at all, and they just eye­ball it us­ing their own sense of difficulty. If they are think­ing any­thing more de­liber­ate and re­flec­tive than that, and in­cor­po­rat­ing real work into cor­rect­ing for the fac­tors that might bias their lenses, they haven’t both­ered writ­ing down their rea­son­ing any­where I can read it.

To know that AGI is decades away, we would need enough un­der­stand­ing of AGI to know what pieces of the puz­zle are miss­ing, and how hard these pieces are to ob­tain; and that kind of in­sight is un­likely to be available un­til the puz­zle is com­plete. Which is also to say that to any­one out­side the lead­ing edge, the puz­zle will look more in­com­plete than it looks on the edge. That pro­ject may pub­lish their the­o­ries in ad­vance of prov­ing them, al­though I hope not. But there are un­proven the­o­ries now too.

And again, that’s not to say that peo­ple say­ing “fifty years” is a cer­tain sign that some­thing is hap­pen­ing in a squash court; they were say­ing “fifty years” sixty years ago too. It’s say­ing that any­one who thinks tech­nolog­i­cal timelines are ac­tu­ally fore­castable, in ad­vance, by peo­ple who are not looped in to the lead­ing pro­ject’s progress re­ports and who don’t share all the best ideas about ex­actly how to do the thing and how much effort is re­quired for that, is learn­ing the wrong les­son from his­tory. In par­tic­u­lar, from read­ing his­tory books that neatly lay out lines of progress and their visi­ble signs that we all know now were im­por­tant and ev­i­den­tial. It’s some­times pos­si­ble to say use­ful con­di­tional things about the con­se­quences of the big de­vel­op­ment when­ever it hap­pens, but it’s rarely pos­si­ble to make con­fi­dent pre­dic­tions about the timing of those de­vel­op­ments, be­yond a one- or two-year hori­zon. And if you are one of the rare peo­ple who can call the timing, if peo­ple like that even ex­ist, no­body else knows to pay at­ten­tion to you and not to the Ex­cited Fu­tur­ists or Sober Skep­tics.

Four: The fu­ture uses differ­ent tools, and can there­fore eas­ily do things that are very hard now, or do with difficulty things that are im­pos­si­ble now.

Why do we know that AGI is decades away? In pop­u­lar ar­ti­cles penned by heads of AI re­search labs and the like, there are typ­i­cally three promi­nent rea­sons given:

(A) The au­thor does not know how to build AGI us­ing pre­sent tech­nol­ogy. The au­thor does not know where to start.

(B) The au­thor thinks it is re­ally very hard to do the im­pres­sive things that mod­ern AI tech­nol­ogy does, they have to slave long hours over a hot GPU farm tweak­ing hy­per­pa­ram­e­ters to get it done. They think that the pub­lic does not ap­pre­ci­ate how hard it is to get any­thing done right now, and is pan­ick­ing pre­ma­turely be­cause the pub­lic thinks any­one can just fire up Ten­sorflow and build a robotic car.

(C) The au­thor spends a lot of time in­ter­act­ing with AI sys­tems and there­fore is able to per­son­ally ap­pre­ci­ate all the ways in which they are still stupid and lack com­mon sense.

We’ve now con­sid­ered some as­pects of ar­gu­ment A. Let’s con­sider ar­gu­ment B for a mo­ment.

Sup­pose I say: “It is now pos­si­ble for one comp-sci grad to do in a week any­thing that N+ years ago the re­search com­mu­nity could do with neu­ral net­works at all.” How large is N?

I got some an­swers to this on Twit­ter from peo­ple whose cre­den­tials I don’t know, but the most com­mon an­swer was five, which sounds about right to me based on my own ac­quain­tance with ma­chine learn­ing. (Though ob­vi­ously not as a literal uni­ver­sal, be­cause re­al­ity is never that neat.) If you could do some­thing in 2012 pe­riod, you can prob­a­bly do it fairly straight­for­wardly with mod­ern GPUs, Ten­sorflow, Xavier ini­tial­iza­tion, batch nor­mal­iza­tion, ReLUs, and Adam or RMSprop or just stochas­tic gra­di­ent de­scent with mo­men­tum. The mod­ern tech­niques are just that much bet­ter. To be sure, there are things we can’t do now with just those sim­ple meth­ods, things that re­quire tons more work, but those things were not pos­si­ble at all in 2012.

In ma­chine learn­ing, when you can do some­thing at all, you are prob­a­bly at most a few years away from be­ing able to do it eas­ily us­ing the fu­ture’s much su­pe­rior tools. From this stand­point, ar­gu­ment B, “You don’t un­der­stand how hard it is to do what we do,” is some­thing of a non-se­quitur when it comes to timing.

State­ment B sounds to me like the same sen­ti­ment voiced by Rutherford in 1933 when he called net en­ergy from atomic fis­sion “moon­sh­ine”. If you were a nu­clear physi­cist in 1933 then you had to split all your atoms by hand, by bom­bard­ing them with other par­ti­cles, and it was a la­bo­ri­ous busi­ness. If some­body talked about get­ting net en­ergy from atoms, maybe it made you feel that you were un­ap­pre­ci­ated, that peo­ple thought your job was easy.

But of course this will always be the lived ex­pe­rience for AI en­g­ineers on se­ri­ous fron­tier pro­jects. You don’t get paid big bucks to do what a grad stu­dent can do in a week (un­less you’re work­ing for a bu­reau­cracy with no clue about AI; but that’s not Google or FB). Your per­sonal ex­pe­rience will always be that what you are paid to spend months do­ing is difficult. A change in this per­sonal ex­pe­rience is there­fore not some­thing you can use as a fire alarm.

Those play­ing the part of wiser sober skep­ti­cal sci­en­tists would ob­vi­ously agree in the ab­stract that our tools will im­prove; but in the pop­u­lar ar­ti­cles they pen, they just talk about the painstak­ing difficulty of this year’s tools. I think that when they’re in that mode they are not even try­ing to fore­cast what the tools will be like in 5 years; they haven’t writ­ten down any such ar­gu­ments as part of the ar­ti­cles I’ve read. I think that when they tell you that AGI is decades off, they are liter­ally giv­ing an es­ti­mate of how long it feels to them like it would take to build AGI us­ing their cur­rent tools and knowl­edge. Which is why they em­pha­size how hard it is to stir the heap of lin­ear alge­bra un­til it spits out good an­swers; I think they are not imag­in­ing, at all, into how this ex­pe­rience may change over con­sid­er­ably less than fifty years. If they’ve ex­plic­itly con­sid­ered the bias of es­ti­mat­ing fu­ture tech timelines based on their pre­sent sub­jec­tive sense of difficulty, and tried to com­pen­sate for that bias, they haven’t writ­ten that rea­son­ing down any­where I’ve read it. Nor have I ever heard of that fore­cast­ing method giv­ing good re­sults his­tor­i­cally.

Five: Okay, let’s be blunt here. I don’t think most of the dis­course about AGI be­ing far away (or that it’s near) is be­ing gen­er­ated by mod­els of fu­ture progress in ma­chine learn­ing. I don’t think we’re look­ing at wrong mod­els; I think we’re look­ing at no mod­els.

I was once at a con­fer­ence where there was a panel full of fa­mous AI lu­mi­nar­ies, and most of the lu­mi­nar­ies were nod­ding and agree­ing with each other that of course AGI was very far off, ex­cept for two fa­mous AI lu­mi­nar­ies who stayed quiet and let oth­ers take the micro­phone.

I got up in Q&A and said, “Okay, you’ve all told us that progress won’t be all that fast. But let’s be more con­crete and spe­cific. I’d like to know what’s the least im­pres­sive ac­com­plish­ment that you are very con­fi­dent can­not be done in the next two years.”

There was a silence.

Even­tu­ally, two peo­ple on the panel ven­tured replies, spo­ken in a rather more ten­ta­tive tone than they’d been us­ing to pro­nounce that AGI was decades out. They named “A robot puts away the dishes from a dish­washer with­out break­ing them”, and Wino­grad schemas. Speci­fi­cally, “I feel quite con­fi­dent that the Wino­grad schemas—where we re­cently had a re­sult that was in the 50, 60% range—in the next two years, we will not get 80, 90% on that re­gard­less of the tech­niques peo­ple use.”

A few months af­ter that panel, there was un­ex­pect­edly a big break­through on Wino­grad schemas. The break­through didn’t crack 80%, so three cheers for wide cred­i­bil­ity in­ter­vals with er­ror mar­gin, but I ex­pect the pre­dic­tor might be feel­ing slightly more ner­vous now with one year left to go. (I don’t think it was the break­through I re­mem­ber read­ing about, but Rob turned up this pa­per as an ex­am­ple of one that could have been sub­mit­ted at most 44 days af­ter the above con­fer­ence and gets up to 70%.)

But that’s not the point. The point is the silence that fell af­ter my ques­tion, and that even­tu­ally I only got two replies, spo­ken in ten­ta­tive tones. When I asked for con­crete feats that were im­pos­si­ble in the next two years, I think that that’s when the lu­mi­nar­ies on that panel switched to try­ing to build a men­tal model of fu­ture progress in ma­chine learn­ing, ask­ing them­selves what they could or couldn’t pre­dict, what they knew or didn’t know. And to their credit, most of them did know their pro­fes­sion well enough to re­al­ize that fore­cast­ing fu­ture bound­aries around a rapidly mov­ing field is ac­tu­ally re­ally hard, that no­body knows what will ap­pear on arXiv next month, and that they needed to put wide cred­i­bil­ity in­ter­vals with very gen­er­ous up­per bounds on how much progress might take place twenty-four months’ worth of arXiv pa­pers later.

(Also, Demis Hass­abis was pre­sent, so they all knew that if they named some­thing in­suffi­ciently im­pos­si­ble, Demis would have Deep­Mind go and do it.)

The ques­tion I asked was in a com­pletely differ­ent genre from the panel dis­cus­sion, re­quiring a men­tal con­text switch: the as­sem­bled lu­mi­nar­ies ac­tu­ally had to try to con­sult their rough, scarce-formed in­tu­itive mod­els of progress in ma­chine learn­ing and figure out what fu­ture ex­pe­riences, if any, their model of the field definitely pro­hibited within a two-year time hori­zon. In­stead of, well, emit­ting so­cially de­sir­able ver­bal be­hav­ior meant to kill that darned hype about AGI and get some pre­dictable ap­plause from the au­di­ence.

I’ll be blunt: I don’t think the con­fi­dent long-ter­mism has been thought out at all. If your model has the ex­traor­di­nary power to say what will be im­pos­si­ble in ten years af­ter an­other one hun­dred and twenty months of arXiv pa­pers, then you ought to be able to say much weaker things that are im­pos­si­ble in two years, and you should have those pre­dic­tions queued up and ready to go rather than fal­ling into ner­vous silence af­ter be­ing asked.

In re­al­ity, the two-year prob­lem is hard and the ten-year prob­lem is laugh­ably hard. The fu­ture is hard to pre­dict in gen­eral, our pre­dic­tive grasp on a rapidly chang­ing and ad­vanc­ing field of sci­ence and en­g­ineer­ing is very weak in­deed, and it doesn’t per­mit nar­row cred­ible in­ter­vals on what can’t be done.

Grace et al. (2017) sur­veyed the pre­dic­tions of 352 pre­sen­ters at ICML and NIPS 2015. Re­spon­dents’ ag­gre­gate fore­cast was that the propo­si­tion “all oc­cu­pa­tions are fully au­tomat­able” (in the sense that “for any oc­cu­pa­tion, ma­chines could be built to carry out the task bet­ter and more cheaply than hu­man work­ers”) will not reach 50% prob­a­bil­ity un­til 121 years hence. Ex­cept that a ran­dom­ized sub­set of re­spon­dents were in­stead asked the slightly differ­ent ques­tion of “when un­aided ma­chines can ac­com­plish ev­ery task bet­ter and more cheaply than hu­man work­ers”, and in this case held that this was 50% likely to oc­cur within 44 years.

That’s what hap­pens when you ask peo­ple to pro­duce an es­ti­mate they can’t es­ti­mate, and there’s a so­cial sense of what the de­sir­able ver­bal be­hav­ior is sup­posed to be.

* * *

When I ob­serve that there’s no fire alarm for AGI, I’m not say­ing that there’s no pos­si­ble equiv­a­lent of smoke ap­pear­ing from un­der a door.

What I’m say­ing rather is that the smoke un­der the door is always go­ing to be ar­guable; it is not go­ing to be a clear and un­de­ni­able and ab­solute sign of fire; and so there is never go­ing to be a fire alarm pro­duc­ing com­mon knowl­edge that ac­tion is now due and so­cially ac­cept­able.

There’s an old trope say­ing that as soon as some­thing is ac­tu­ally done, it ceases to be called AI. Peo­ple who work in AI and are in a broad sense pro-ac­cel­er­a­tionist and techno-en­thu­si­ast, what you might call the Kurzweilian camp (of which I am not a mem­ber), will some­times rail against this as un­fair­ness in judg­ment, as mov­ing goal­posts.

This over­looks a real and im­por­tant phe­nomenon of ad­verse se­lec­tion against AI ac­com­plish­ments: If you can do some­thing im­pres­sive-sound­ing with AI in 1974, then that is be­cause that thing turned out to be doable in some cheap cheaty way, not be­cause 1974 was so amaz­ingly great at AI. We are un­cer­tain about how much cog­ni­tive effort it takes to perform tasks, and how easy it is to cheat at them, and the first “im­pres­sive” tasks to be ac­com­plished will be those where we were most wrong about how much effort was re­quired. There was a time when some peo­ple thought that a com­puter win­ning the world chess cham­pi­onship would re­quire progress in the di­rec­tion of AGI, and that this would count as a sign that AGI was get­ting closer. When Deep Blue beat Kas­parov in 1997, in a Bayesian sense we did learn some­thing about progress in AI, but we also learned some­thing about chess be­ing easy. Con­sid­er­ing the tech­niques used to con­struct Deep Blue, most of what we learned was “It is sur­pris­ingly pos­si­ble to play chess with­out easy-to-gen­er­al­ize tech­niques” and not much “A sur­pris­ing amount of progress has been made to­ward AGI.”

Was AlphaGo smoke un­der the door, a sign of AGI in 10 years or less? Peo­ple had pre­vi­ously given Go as an ex­am­ple of What You See Be­fore The End.

Look­ing over the pa­per de­scribing AlphaGo’s ar­chi­tec­ture, it seemed to me that we were mostly learn­ing that available AI tech­niques were likely to go fur­ther to­wards gen­er­al­ity than ex­pected, rather than about Go be­ing sur­pris­ingly easy to achieve with fairly nar­row and ad-hoc ap­proaches. Not that the method scales to AGI, ob­vi­ously; but AlphaGo did look like a product of rel­a­tively gen­eral in­sights and tech­niques be­ing turned on the spe­cial case of Go, in a way that Deep Blue wasn’t. I also up­dated sig­nifi­cantly on “The gen­eral learn­ing ca­pa­bil­ities of the hu­man cor­ti­cal al­gorithm are less im­pres­sive, less difficult to cap­ture with a ton of gra­di­ent de­scent and a zillion GPUs, than I thought,” be­cause if there were any­where we ex­pected an im­pres­sive hard-to-match highly-nat­u­ral-se­lected but-still-gen­eral cor­ti­cal al­gorithm to come into play, it would be in hu­mans play­ing Go.

Maybe if we’d seen a thou­sand Earths un­der­go­ing similar events, we’d gather the statis­tics and find that a com­puter win­ning the plane­tary Go cham­pi­onship is a re­li­able ten-year-harbinger of AGI. But I don’t ac­tu­ally know that. Nei­ther do you. Cer­tainly, any­one can pub­li­cly ar­gue that we just learned Go was eas­ier to achieve with strictly nar­row tech­niques than ex­pected, as was true many times in the past. There’s no pos­si­ble sign short of ac­tual AGI, no case of smoke from un­der the door, for which we know that this is definitely se­ri­ous fire and now AGI is 10, 5, or 2 years away. Let alone a sign where we know ev­ery­one else will be­lieve it.

And in any case, mul­ti­ple lead­ing sci­en­tists in ma­chine learn­ing have already pub­lished ar­ti­cles tel­ling us their crite­rion for a fire alarm. They will be­lieve Ar­tifi­cial Gen­eral In­tel­li­gence is im­mi­nent:

(A) When they per­son­ally see how to con­struct AGI us­ing their cur­rent tools. This is what they are always say­ing is not cur­rently true in or­der to cas­ti­gate the folly of those who think AGI might be near.

(B) When their per­sonal jobs do not give them a sense of ev­ery­thing be­ing difficult. This, they are at pains to say, is a key piece of knowl­edge not pos­sessed by the ig­no­rant layfolk who think AGI might be near, who only be­lieve that be­cause they have never stayed up un­til 2AM try­ing to get a gen­er­a­tive ad­ver­sar­ial net­work to sta­bi­lize.

(C) When they are very im­pressed by how smart their AI is rel­a­tive to a hu­man be­ing in re­spects that still feel mag­i­cal to them; as op­posed to the parts they do know how to en­g­ineer, which no longer seem mag­i­cal to them; aka the AI seem­ing pretty smart in in­ter­ac­tion and con­ver­sa­tion; aka the AI ac­tu­ally be­ing an AGI already.

So there isn’t go­ing to be a fire alarm. Pe­riod.

There is never go­ing to be a time be­fore the end when you can look around ner­vously, and see that it is now clearly com­mon knowl­edge that you can talk about AGI be­ing im­mi­nent, and take ac­tion and exit the build­ing in an or­derly fash­ion, with­out fear of look­ing stupid or fright­ened.

* * *

So far as I can presently es­ti­mate, now that we’ve had AlphaGo and a cou­ple of other maybe/​maybe-not shots across the bow, and seen a huge ex­plo­sion of effort in­vested into ma­chine learn­ing and an enor­mous flood of pa­pers, we are prob­a­bly go­ing to oc­cupy our pre­sent epistemic state un­til very near the end.

By say­ing we’re prob­a­bly go­ing to be in roughly this epistemic state un­til al­most the end, I don’t mean to say we know that AGI is im­mi­nent, or that there won’t be im­por­tant new break­throughs in AI in the in­ter­ven­ing time. I mean that it’s hard to guess how many fur­ther in­sights are needed for AGI, or how long it will take to reach those in­sights. After the next break­through, we still won’t know how many more break­throughs are needed, leav­ing us in pretty much the same epistemic state as be­fore. What­ever dis­cov­er­ies and mile­stones come next, it will prob­a­bly con­tinue to be hard to guess how many fur­ther in­sights are needed, and timelines will con­tinue to be similarly murky. Maybe re­searcher en­thu­si­asm and fund­ing will rise fur­ther, and we’ll be able to say that timelines are short­en­ing; or maybe we’ll hit an­other AI win­ter, and we’ll know that’s a sign in­di­cat­ing that things will take longer than they would oth­er­wise; but we still won’t know how long.

At some point we might see a sud­den flood of arXiv pa­pers in which re­ally in­ter­est­ing and fun­da­men­tal and scary cog­ni­tive challenges seem to be get­ting done at an in­creas­ing pace. Where­upon, as this flood ac­cel­er­ates, even some who imag­ine them­selves sober and skep­ti­cal will be un­nerved to the point that they ven­ture that per­haps AGI is only 15 years away now, maybe, pos­si­bly. The signs might be­come so blatant, very soon be­fore the end, that peo­ple start think­ing it is so­cially ac­cept­able to say that maybe AGI is 10 years off. Though the signs would have to be pretty darned blatant, if they’re to over­come the so­cial bar­rier posed by lu­mi­nar­ies who are es­ti­mat­ing ar­rival times to AGI us­ing their per­sonal knowl­edge and per­sonal difficul­ties, as well as all the his­tor­i­cal bad feel­ings about AI win­ters caused by hype.

But even if it be­comes so­cially ac­cept­able to say that AGI is 15 years out, in those last cou­ple of years or months, I would still ex­pect there to be dis­agree­ment. There will still be oth­ers protest­ing that, as much as as­so­ci­a­tive mem­ory and hu­man-equiv­a­lent cere­bel­lar co­or­di­na­tion (or what­ever) are now solved prob­lems, they still don’t know how to con­struct AGI. They will note that there are no AIs writ­ing com­puter sci­ence pa­pers, or hold­ing a truly sen­si­ble con­ver­sa­tion with a hu­man, and cas­ti­gate the sense­less alarmism of those who talk as if we already knew how to do that. They will ex­plain that fool­ish laypeo­ple don’t re­al­ize how much pain and tweak­ing it takes to get the cur­rent sys­tems to work. (Although those mod­ern meth­ods can eas­ily do al­most any­thing that was pos­si­ble in 2017, and any grad stu­dent knows how to roll a sta­ble GAN on the first try us­ing the tf.un­su­per­vised mod­ule in Ten­sorflow 5.3.1.)

When all the pieces are ready and in place, lack­ing only the last piece to be as­sem­bled by the very peak of knowl­edge and cre­ativity across the whole world, it will still seem to the av­er­age ML per­son that AGI is an enor­mous challenge loom­ing in the dis­tance, be­cause they still won’t per­son­ally know how to con­struct an AGI sys­tem. Pres­ti­gious heads of ma­jor AI re­search groups will still be writ­ing ar­ti­cles de­cry­ing the folly of fret­ting about the to­tal de­struc­tion of all Earthly life and all fu­ture value it could have achieved, and say­ing that we should not let this dis­tract us from real, re­spectable con­cerns like loan-ap­proval sys­tems ac­ci­den­tally ab­sorb­ing hu­man bi­ases.

Of course, the fu­ture is very hard to pre­dict in de­tail. It’s so hard that not only do I con­fess my own in­abil­ity, I make the far stronger pos­i­tive state­ment that no­body else can do it ei­ther. The “flood of ground­break­ing arXiv pa­pers” sce­nario is one way things could maybe pos­si­bly go, but it’s an im­plau­si­bly spe­cific sce­nario that I made up for the sake of con­crete­ness. It’s cer­tainly not based on my ex­ten­sive ex­pe­rience watch­ing other Earth­like civ­i­liza­tions de­velop AGI. I do put a sig­nifi­cant chunk of prob­a­bil­ity mass on “There’s not much sign visi­ble out­side a Man­hat­tan Pro­ject un­til Hiroshima,” be­cause that sce­nario is sim­ple. Any­thing more com­plex is just one more story full of bur­den­some de­tails that aren’t likely to all be true.

But no mat­ter how the de­tails play out, I do pre­dict in a very gen­eral sense that there will be no fire alarm that is not an ac­tual run­ning AGI—no un­mis­tak­able sign be­fore then that ev­ery­one knows and agrees on, that lets peo­ple act with­out feel­ing ner­vous about whether they’re wor­ry­ing too early. That’s just not how the his­tory of tech­nol­ogy has usu­ally played out in much sim­pler cases like flight and nu­clear en­g­ineer­ing, let alone a case like this one where all the signs and mod­els are dis­puted. We already know enough about the un­cer­tainty and low qual­ity of dis­cus­sion sur­round­ing this topic to be able to say with con­fi­dence that there will be no unar­guable so­cially ac­cepted sign of AGI ar­riv­ing 10 years, 5 years, or 2 years be­fore­hand. If there’s any gen­eral so­cial panic it will be by co­in­ci­dence, based on ter­rible rea­son­ing, un­cor­re­lated with real timelines ex­cept by to­tal co­in­ci­dence, set off by a Hol­ly­wood movie, and fo­cused on rel­a­tively triv­ial dan­gers.

It’s no co­in­ci­dence that no­body has given any ac­tual ac­count of such a fire alarm, and ar­gued con­vinc­ingly about how much time it means we have left, and what pro­jects we should only then start. If any­one does write that pro­posal, the next per­son to write one will say some­thing com­pletely differ­ent. And prob­a­bly nei­ther of them will suc­ceed at con­vinc­ing me that they know any­thing prophetic about timelines, or that they’ve iden­ti­fied any sen­si­ble an­gle of at­tack that is (a) worth pur­su­ing at all and (b) not worth start­ing to work on right now.

* * *

It seems to me that the de­ci­sion to de­lay all ac­tion un­til a neb­u­lous to­tally un­speci­fied fu­ture alarm goes off, im­plies an or­der of reck­less­ness great enough that the law of con­tinued failure comes into play.

The law of con­tinued failure is the rule that says that if your coun­try is in­com­pe­tent enough to use a plain­text 9-nu­meric-digit pass­word on all of your bank ac­counts and credit ap­pli­ca­tions, your coun­try is not com­pe­tent enough to cor­rect course af­ter the next dis­aster in which a hun­dred mil­lion pass­words are re­vealed. A civ­i­liza­tion com­pe­tent enough to cor­rect course in re­sponse to that prod, to re­act to it the way you’d want them to re­act, is com­pe­tent enough not to make the mis­take in the first place. When a sys­tem fails mas­sively and ob­vi­ously, rather than sub­tly and at the very edges of com­pe­tence, the next prod is not go­ing to cause the sys­tem to sud­denly snap into do­ing things in­tel­li­gently.

The law of con­tinued failure is es­pe­cially im­por­tant to keep in mind when you are deal­ing with big pow­er­ful sys­tems or high-sta­tus peo­ple that you might feel ner­vous about dero­gat­ing, be­cause you may be tempted to say, “Well, it’s flawed now, but as soon as a fu­ture prod comes along, ev­ery­thing will snap into place and ev­ery­thing will be all right.” The sys­tems about which this fond hope is ac­tu­ally war­ranted look like they are mostly do­ing all the im­por­tant things right already, and only failing in one or two steps of cog­ni­tion. The fond hope is al­most never war­ranted when a per­son or or­ga­ni­za­tion or gov­ern­ment or so­cial sub­sys­tem is cur­rently fal­ling mas­sively short.

The folly re­quired to ig­nore the prospect of aliens land­ing in thirty years is already great enough that the other flawed el­e­ments of the de­bate should come as no sur­prise.

And with all of that go­ing wrong si­mul­ta­neously to­day, we should pre­dict that the same sys­tem and in­cen­tives won’t pro­duce cor­rect out­puts af­ter re­ceiv­ing an un­cer­tain sign that maybe the aliens are land­ing in five years in­stead. The law of con­tinued failure sug­gests that if ex­ist­ing au­thor­i­ties failed in enough differ­ent ways at once to think that it makes sense to try to de­rail a con­ver­sa­tion about ex­is­ten­tial risk by say­ing the real prob­lem is the se­cu­rity on self-driv­ing cars, the de­fault ex­pec­ta­tion is that they will still be say­ing silly things later.

Peo­ple who make large num­bers of si­mul­ta­neous mis­takes don’t gen­er­ally have all of the in­cor­rect thoughts sub­con­sciously la­beled as “in­cor­rect” in their heads. Even when mo­ti­vated, they can’t sud­denly flip to skil­lfully ex­e­cut­ing all-cor­rect rea­son­ing steps in­stead. Yes, we have var­i­ous ex­per­i­ments show­ing that mon­e­tary in­cen­tives can re­duce over­con­fi­dence and poli­ti­cal bias, but (a) that’s re­duc­tion rather than elimi­na­tion, (b) it’s with ex­tremely clear short-term di­rect in­cen­tives, not the neb­u­lous and poli­ti­ciz­able in­cen­tive of “a lot be­ing at stake”, and (c) that doesn’t mean a switch is flip­ping all the way to “carry out com­pli­cated cor­rect rea­son­ing”. If some­one’s brain con­tains a switch that can flip to en­able com­pli­cated cor­rect rea­son­ing at all, it’s got enough in­ter­nal pre­ci­sion and skill to think mostly-cor­rect thoughts now in­stead of later—at least to the de­gree that some con­ser­vatism and dou­ble-check­ing gets built into ex­am­in­ing the con­clu­sions that peo­ple know will get them kil­led if they’re wrong about them.

There is no sign and por­tent, no thresh­old crossed, that sud­denly causes peo­ple to wake up and start do­ing things sys­tem­at­i­cally cor­rectly. Peo­ple who can re­act that com­pe­tently to any sign at all, let alone a less-than-perfectly-cer­tain not-to­tally-agreed item of ev­i­dence that is likely a wakeup call, have prob­a­bly already done the time­bind­ing thing. They’ve already imag­ined the fu­ture sign com­ing, and gone ahead and thought sen­si­ble thoughts ear­lier, like Stu­art Rus­sell say­ing, “If you know the aliens are land­ing in thirty years, it’s still a big deal now.”

* * *

Back in the fund­ing-starved early days of what is now MIRI, I learned that peo­ple who donated last year were likely to donate this year, and peo­ple who last year were plan­ning to donate “next year” would quite of­ten this year be plan­ning to donate “next year”. Of course there were gen­uine tran­si­tions from zero to one; ev­ery­thing that hap­pens needs to hap­pen for a first time. There were col­lege stu­dents who said “later” and gave noth­ing for a long time in a gen­uinely strate­gi­cally wise way, and went on to get nice jobs and start donat­ing. But I also learned well that, like many cheap and easy so­laces, say­ing the word “later” is ad­dic­tive; and that this lux­ury is available to the rich as well as the poor.

I don’t ex­pect it to be any differ­ent with AGI al­ign­ment work. Peo­ple who are try­ing to get what grasp they can on the al­ign­ment prob­lem will, in the next year, be do­ing a lit­tle (or a lot) bet­ter with what­ever they grasped in the pre­vi­ous year (plus, yes, any gen­eral-field ad­vances that have taken place in the mean­time). Peo­ple who want to defer that un­til af­ter there’s a bet­ter un­der­stand­ing of AI and AGI will, af­ter the next year’s worth of ad­vance­ments in AI and AGI, want to defer work un­til a bet­ter fu­ture un­der­stand­ing of AI and AGI.

Some peo­ple re­ally want al­ign­ment to get done and are there­fore now try­ing to wrack their brains about how to get some­thing like a re­in­force­ment learner to re­li­ably iden­tify a util­ity func­tion over par­tic­u­lar el­e­ments in a model of the causal en­vi­ron­ment in­stead of a sen­sory re­ward term or defeat the seem­ing tau­tolog­i­cal­ness of up­dated (non-)defer­ence. Others would rather be work­ing on other things, and will there­fore de­clare that there is no work that can pos­si­bly be done to­day, not spend­ing two hours quietly think­ing about it first be­fore mak­ing that dec­la­ra­tion. And this will not change to­mor­row, un­less per­haps to­mor­row is when we wake up to some in­ter­est­ing news­pa­per head­lines, and prob­a­bly not even then. The lux­ury of say­ing “later” is not available only to the truly poor-in-available-op­tions.

After a while, I started tel­ling effec­tive al­tru­ists in col­lege: “If you’re plan­ning to earn-to-give later, then for now, give around $5 ev­ery three months. And never give ex­actly the same amount twice in a row, or give to the same or­ga­ni­za­tion twice in a row, so that you prac­tice the men­tal habit of re-eval­u­at­ing causes and re-eval­u­at­ing your dona­tion amounts on a reg­u­lar ba­sis. Don’t learn the men­tal habit of just always say­ing ‘later’.”

Similarly, if some­body was ac­tu­ally go­ing to work on AGI al­ign­ment “later”, I’d tell them to, ev­ery six months, spend a cou­ple of hours com­ing up with the best cur­rent scheme they can de­vise for al­ign­ing AGI and do­ing use­ful work on that scheme. As­sum­ing, if they must, that AGI were some­how done with tech­nol­ogy re­sem­bling cur­rent tech­nol­ogy. And pub­lish­ing their best-cur­rent-scheme-that-isn’t-good-enough, at least in the sense of post­ing it to Face­book; so that they will have a sense of em­bar­rass­ment about nam­ing a scheme that does not look like some­body ac­tu­ally spent two hours try­ing to think of the best bad ap­proach.

There are things we’ll bet­ter un­der­stand about AI in the fu­ture, and things we’ll learn that might give us more con­fi­dence that par­tic­u­lar re­search ap­proaches will be rele­vant to AGI. There may be more fu­ture so­ciolog­i­cal de­vel­op­ments akin to Nick Bostrom pub­lish­ing Su­per­in­tel­li­gence, Elon Musk tweet­ing about it and thereby heav­ing a rock through the Over­ton Win­dow, or more re­spectable lu­mi­nar­ies like Stu­art Rus­sell openly com­ing on board. The fu­ture will hold more AlphaGo-like events to pub­li­cly and pri­vately high­light new ground-level ad­vances in ML tech­nique; and it may some­how be that this does not leave us in the same epistemic state as hav­ing already seen AlphaGo and GANs and the like. It could hap­pen! I can’t see ex­actly how, but the fu­ture does have the ca­pac­ity to pull sur­prises in that re­gard.

But be­fore wait­ing on that sur­prise, you should ask whether your un­cer­tainty about AGI timelines is re­ally un­cer­tainty at all. If it feels to you that guess­ing AGI might have a 50% prob­a­bil­ity in N years is not enough knowl­edge to act upon, if that feels scar­ily un­cer­tain and you want to wait for more ev­i­dence be­fore mak­ing any de­ci­sions… then ask your­self how you’d feel if you be­lieved the prob­a­bil­ity was 50% in N years, and ev­ery­one else on Earth also be­lieved it was 50% in N years, and ev­ery­one be­lieved it was right and proper to carry out policy P when AGI has a 50% prob­a­bil­ity of ar­riv­ing in N years. If that vi­su­al­iza­tion feels very differ­ent, then any ner­vous “un­cer­tainty” you feel about do­ing P is not re­ally about whether AGI takes much longer than N years to ar­rive.

And you are al­most surely go­ing to be stuck with that feel­ing of “un­cer­tainty” no mat­ter how close AGI gets; be­cause no mat­ter how close AGI gets, what­ever signs ap­pear will al­most surely not pro­duce com­mon, share, agreed-on pub­lic knowl­edge that AGI has a 50% chance of ar­riv­ing in N years, nor any agree­ment that it is there­fore right and proper to re­act by do­ing P.

And if all that did be­come com­mon knowl­edge, then P is un­likely to still be a ne­glected in­ter­ven­tion, or AI al­ign­ment a ne­glected is­sue; so you will have waited un­til sadly late to help.

But far more likely is that the com­mon knowl­edge just isn’t go­ing to be there, and so it will always feel ner­vously “un­cer­tain” to con­sider act­ing.

You can ei­ther act de­spite that, or not act. Not act un­til it’s too late to help much, in the best case; not act at all un­til af­ter it’s es­sen­tially over, in the av­er­age case.

I don’t think it’s wise to wait on an un­speci­fied epistemic mir­a­cle to change how we feel. In all prob­a­bil­ity, you’re go­ing to be in this men­tal state for a while—in­clud­ing any ner­vous-feel­ing “un­cer­tainty”. If you han­dle this men­tal state by say­ing “later”, that gen­eral policy is not likely to have good re­sults for Earth.

* * *

Fur­ther re­sources: