Three Stories for How AGI Comes Before FAI

Epistemic sta­tus: fake framework

To do effec­tive differ­en­tial tech­nolog­i­cal de­vel­op­ment for AI safety, we’d like to know which com­bi­na­tions of AI in­sights are more likely to lead to FAI vs UFAI. This is an over­ar­ch­ing strate­gic con­sid­er­a­tion which feeds into ques­tions like how to think about the value of AI ca­pa­bil­ities re­search.

As far as I can tell, there are ac­tu­ally sev­eral differ­ent sto­ries for how we may end up with a set of AI in­sights which makes UFAI more likely than FAI, and these sto­ries aren’t en­tirely com­pat­i­ble with one an­other.

Note: In this doc­u­ment, when I say “FAI”, I mean any su­per­in­tel­li­gent sys­tem which does a good job of helping hu­mans (so an “al­igned Task AGI” also counts).

Story #1: The Road­block Story

Nate Soares de­scribes the road­block story in this com­ment:

...if a safety-con­scious AGI team asked how we’d ex­pect their pro­ject to fail, the two like­liest sce­nar­ios we’d point to are “your team runs into a ca­pa­bil­ities road­block and can’t achieve AGI” or “your team runs into an al­ign­ment road­block and can eas­ily tell that the sys­tem is cur­rently mis­al­igned, but can’t figure out how to achieve al­ign­ment in any rea­son­able amount of time.”

(em­pha­sis mine)

The road­block story hap­pens if there are key safety in­sights that FAI needs but AGI doesn’t need. In this story, the knowl­edge needed for FAI is a su­per­set of the knowl­edge needed for AGI. If the safety in­sights are difficult to ob­tain, or no one is work­ing to ob­tain them, we could find our­selves in a situ­a­tion where we have all the AGI in­sights with­out hav­ing all the FAI in­sights.

There is sub­tlety here. In or­der to make a strong ar­gu­ment for the ex­is­tence of in­sights like this, it’s not enough to point to failures of ex­ist­ing sys­tems, or de­scribe hy­po­thet­i­cal failures of fu­ture sys­tems. You also need to ex­plain why the in­sights nec­es­sary to cre­ate AGI wouldn’t be suffi­cient to fix the prob­lems.

Some pos­si­ble ways the road­block story could come about:

  • Maybe safety in­sights are more or less ag­nos­tic to the cho­sen AGI tech­nol­ogy and can be dis­cov­ered in par­allel. (Stu­art Rus­sell has pushed against this, say­ing that in the same way mak­ing sure bridges don’t fall down is part of civil en­g­ineer­ing, safety should be part of main­stream AI re­search.)

  • Maybe safety in­sights re­quire AGI in­sights as a pre­req­ui­site, leav­ing us in a pre­car­i­ous po­si­tion where we will have ac­quired the ca­pa­bil­ity to build an AGI be­fore we be­gin crit­i­cal FAI re­search.

    • This could be the case if the needed safety in­sights are mostly about how to safely as­sem­ble AGI in­sights into an FAI. It’s pos­si­ble we could do a bit of this work in ad­vance by de­vel­op­ing “con­tin­gency plans” for how we would con­struct FAI in the event of com­bi­na­tions of ca­pa­bil­ities ad­vances that seem plau­si­ble.

      • Paul Chris­ti­ano’s IDA frame­work could be con­sid­ered a con­tin­gency plan for the case where we de­velop much more pow­er­ful imi­ta­tion learn­ing.

      • Contin­gency plans could also be helpful for di­rect­ing differ­en­tial tech­nolog­i­cal de­vel­op­ment, since we’d get a sense of the difficulty of FAI un­der var­i­ous tech de­vel­op­ment sce­nar­ios.

  • Maybe there will be mul­ti­ple sub­sets of the in­sights needed for FAI which are suffi­cient for AGI.

    • In this case, we’d like to speed the dis­cov­ery of whichever FAI in­sight will be dis­cov­ered last.

Story #2: The Se­cu­rity Story

From Se­cu­rity Mind­set and the Lo­gis­tic Suc­cess Curve:

CORAL: You know, back in main­stream com­puter se­cu­rity, when you pro­pose a new way of se­cur­ing a sys­tem, it’s con­sid­ered tra­di­tional and wise for ev­ery­one to gather around and try to come up with rea­sons why your idea might not work. It’s un­der­stood that no mat­ter how smart you are, most seem­ingly bright ideas turn out to be flawed, and that you shouldn’t be touchy about peo­ple try­ing to shoot them down.

The main differ­ence be­tween the se­cu­rity story and the road­block story is that in the se­cu­rity story, it’s not ob­vi­ous that the sys­tem is mis­al­igned.

We can sub­di­vide the se­cu­rity story based on the ease of fix­ing a flaw if we’re able to de­tect it in ad­vance. For ex­am­ple, vuln­er­a­bil­ity #1 on the OWASP Top 10 is in­jec­tion, which is typ­i­cally easy to patch once it’s dis­cov­ered. Inse­cure sys­tems are of­ten right next to se­cure sys­tems in pro­gram space.

If the se­cu­rity story is what we are wor­ried about, it could be wise to try & de­velop the AI equiv­a­lent of OWASP’s Cheat Sheet Series, to make it eas­ier for peo­ple to find se­cu­rity prob­lems with AI sys­tems. Of course, many items on the cheat sheet would be spec­u­la­tive, since AGI doesn’t ac­tu­ally ex­ist yet. But it could still serve as a use­ful start­ing point for brain­storm­ing flaws.

Differ­en­tial tech­nolog­i­cal de­vel­op­ment could be use­ful in the se­cu­rity story if we push for the de­vel­op­ment of AI tech that is eas­ier to se­cure. How­ever, it’s not clear how con­fi­dent we can be in our in­tu­itions about what will or won’t be easy to se­cure. In his book Think­ing Fast and Slow, Daniel Kah­ne­man de­scribes his ad­ver­sar­ial col­lab­o­ra­tion with ex­per­tise re­searcher Gary Klein. Kah­ne­man was an ex­per­tise skep­tic, and Klein an ex­per­tise booster:

We even­tu­ally con­cluded that our dis­agree­ment was due in part to the fact that we had differ­ent ex­perts in mind. Klein had spent much time with fire­ground com­man­ders, clini­cal nurses, and other pro­fes­sion­als who have real ex­per­tise. I had spent more time think­ing about clini­ci­ans, stock pick­ers, and poli­ti­cal sci­en­tists try­ing to make un­sup­port­able long-term fore­casts. Not sur­pris­ingly, his de­fault at­ti­tude was trust and re­spect; mine was skep­ti­cism.


When do judg­ments re­flect true ex­per­tise? … The an­swer comes from the two ba­sic con­di­tions for ac­quiring a skill:

  • an en­vi­ron­ment that is suffi­ciently reg­u­lar to be predictable

  • an op­por­tu­nity to learn these reg­u­lar­i­ties through pro­longed practice

In a less reg­u­lar, or low-val­idity, en­vi­ron­ment, the heuris­tics of judg­ment are in­voked. Sys­tem 1 is of­ten able to pro­duce quick an­swers to difficult ques­tions by sub­sti­tu­tion, cre­at­ing co­her­ence where there is none. The ques­tion that is an­swered is not the one that was in­tended, but the an­swer is pro­duced quickly and may be suffi­ciently plau­si­ble to pass the lax and le­nient re­view of Sys­tem 2. You may want to fore­cast the com­mer­cial fu­ture of a com­pany, for ex­am­ple, and be­lieve that this is what you are judg­ing, while in fact your eval­u­a­tion is dom­i­nated by your im­pres­sions of the en­ergy and com­pe­tence of its cur­rent ex­ec­u­tives. Be­cause sub­sti­tu­tion oc­curs au­to­mat­i­cally, you of­ten do not know the ori­gin of a judg­ment that you (your Sys­tem 2) en­dorse and adopt. If it is the only one that comes to mind, it may be sub­jec­tively undis­t­in­guish­able from valid judg­ments that you make with ex­pert con­fi­dence. This is why sub­jec­tive con­fi­dence is not a good di­ag­nos­tic of ac­cu­racy: judg­ments that an­swer the wrong ques­tion can also be made with high con­fi­dence.

Our in­tu­itions are only as good as the data we’ve seen. “Gather­ing data” for an AI se­cu­rity cheat sheet could helpful for de­vel­op­ing se­cu­rity in­tu­ition. But I think we should be skep­ti­cal of in­tu­ition any­way, given the spec­u­la­tive na­ture of the topic.

Story #3: The Alchemy Story

Ali Rahimi and Ben Recht de­scribe the alchemy story in their Test-of-time award pre­sen­ta­tion at the NeurIPS ma­chine learn­ing con­fer­ence in 2017 (video):

Batch Norm is a tech­nique that speeds up gra­di­ent de­scent on deep nets. You sprin­kle it be­tween your lay­ers and gra­di­ent de­scent goes faster. I think it’s ok to use tech­niques we don’t un­der­stand. I only vaguely un­der­stand how an air­plane works, and I was fine tak­ing one to this con­fer­ence. But it’s always bet­ter if we build sys­tems on top of things we do un­der­stand deeply? This is what we know about why batch norm works well. But don’t you want to un­der­stand why re­duc­ing in­ter­nal co­vari­ate shift speeds up gra­di­ent de­scent? Don’t you want to see ev­i­dence that Batch Norm re­duces in­ter­nal co­vari­ate shift? Don’t you want to know what in­ter­nal co­vari­ate shift is? Batch Norm has be­come a foun­da­tional op­er­a­tion for ma­chine learn­ing. It works amaz­ingly well. But we know al­most noth­ing about it.

(em­pha­sis mine)

The alchemy story has similar­i­ties to both the road­block story and the se­cu­rity story.

From the per­spec­tive of the road­block story, “al­chem­i­cal” in­sights could be viewed as in­sights which could be use­ful if we only cared about cre­at­ing AGI, but are too un­re­li­able to use in an FAI. (It’s pos­si­ble there are other in­sights which fall into the “us­able for AGI but not FAI” cat­e­gory due to some­thing other than their al­chem­i­cal na­ture—if you can think of any, I’d be in­ter­ested to hear.)

In some ways, alchemy could be worse than a clear road­block. It might be that not ev­ery­one agrees whether the sys­tems are re­li­able enough to form the ba­sis of an FAI, and then we’re look­ing at a unilat­er­al­ist’s curse sce­nario.

Just like chem­istry only came af­ter alchemy, it’s pos­si­ble that we’ll first de­velop the ca­pa­bil­ity to cre­ate AGI via al­chem­i­cal means, and only ac­quire the deeper un­der­stand­ing nec­es­sary to cre­ate a re­li­able FAI later. (This is a sce­nario from the road­block sec­tion, where FAI in­sights re­quire AGI in­sights as a pre­req­ui­site.) To pre­vent this, we could try & deepen our un­der­stand­ing of com­po­nents we ex­pect to fail in sub­tle ways, and re­tard the de­vel­op­ment of com­po­nents we ex­pect to “just work” with­out any sur­prises once in­vented.

From the per­spec­tive of the se­cu­rity story, “al­chem­i­cal” in­sights could be viewed as com­po­nents which are clearly prone to vuln­er­a­bil­ities. Al­chem­i­cal com­po­nents could pro­duce failures which are hard to un­der­stand or sum­ma­rize, let alone fix. From a differ­en­tial tech­nolog­i­cal de­vel­op­ment point of view, the best ap­proach may be to differ­en­tially ad­vance less al­chem­i­cal, more in­ter­pretable AI paradigms, de­vel­op­ing the AI equiv­a­lent of re­li­able cryp­to­graphic prim­i­tives. (Note that ex­plain­abil­ity is in­fe­rior to in­ter­pretabil­ity.)

Try­ing to cre­ate an FAI from al­chem­i­cal com­po­nents is ob­vi­ously not the best idea. But it’s not to­tally clear how much of a risk these com­po­nents pose, be­cause if the com­po­nents don’t work re­li­ably, an AGI built from them may not work well enough to pose a threat. Such an AGI could work bet­ter over time if it’s able to im­prove its own com­po­nents. In this case, we might be able to pro­gram it so it pe­ri­od­i­cally re-eval­u­ates its train­ing data as its com­po­nents get up­graded, so its un­der­stand­ing of hu­man val­ues im­proves as its com­po­nents im­prove.

Dis­cus­sion Questions

  • How plau­si­ble does each story seem?

  • What pos­si­bil­ities aren’t cov­ered by the tax­on­omy pro­vided?

  • What dis­tinc­tions does this frame­work fail to cap­ture?

  • Which claims are in­cor­rect?