FAI Research Constraints and AGI Side Effects

Ozzie Gooen and Justin Shovelain


Friendly ar­tifi­cial in­tel­li­gence (FAI) re­searchers have at least two sig­nifi­cant challenges. First, they must pro­duce a sig­nifi­cant amount of FAI re­search in a short amount of time. Se­cond, they must do so with­out pro­duc­ing enough gen­eral ar­tifi­cial in­tel­li­gence (AGI) re­search to re­sult in the cre­ation of an un­friendly ar­tifi­cial in­tel­li­gence (UFAI). We es­ti­mate the re­quire­ments of both of these challenges us­ing two sim­ple mod­els.

Our first model de­scribes a friendli­ness ra­tio and a leak­age ra­tio for FAI re­search pro­jects. Th­ese provide limits on the al­low­able amount of ar­tifi­cial gen­eral in­tel­li­gence (AGI) knowl­edge pro­duced per unit of FAI knowl­edge in or­der for a pro­ject to be net benefi­cial.

Our sec­ond model stud­ies a hy­po­thet­i­cal FAI ven­ture, which is re­spon­si­ble for en­sur­ing FAI cre­ation. We es­ti­mate nec­es­sary to­tal FAI re­search per year from the ven­ture and leak­age ra­tio of that re­search. This model demon­strates a trade off be­tween the speed of FAI re­search and the pro­por­tion of AGI re­search that can be re­vealed as part of it. If FAI re­search takes too long, then the ac­cept­able leak­age ra­tio may be­come so low that it would be­come nearly im­pos­si­ble to safely pro­duce any new re­search.


A gen­eral ar­tifi­cial in­tel­li­gence (AGI) is an AI that could perform all the in­tel­lec­tual tasks a hu­man can.[1] When one is cre­ated, it may re­cur­sively be­come more in­tel­li­gent to the point where it is vastly su­pe­rior to hu­man in­tel­li­gences.[2] This AGI could be ei­ther friendly or un­friendly, where friendli­ness means it would have val­ues that hu­mans would fa­vor, and un­friendli­ness means that it would not.[3]

It is likely that if we do not ex­plic­itly un­der­stand how to make a friendly gen­eral ar­tifi­cial in­tel­li­gence, then by the time we make a gen­eral ar­tifi­cial in­tel­li­gence, it will be un­friendly.[4] It is also likely that we are much fur­ther from un­der­stand­ing how to make a friendly ar­tifi­cial in­tel­li­gence than we are from un­der­stand­ing how to make a gen­eral ar­tifi­cial in­tel­li­gence.[5][6]

Thus, it is im­por­tant to cre­ate more FAI re­search, but it may also be im­por­tant to make sure to not pro­duce much AGI re­search when do­ing so. If it is 10 times as difficult to un­der­stand how to make an FAI than to un­der­stand how to make an AGI, then a FAI re­search pa­per that pro­duces 0.3 equiv­a­lent pa­pers worth of AGI re­search will prob­a­bly in­crease the chances of a UFAI. Given the close re­la­tion­ship of FAI and AGI re­search, pro­duc­ing FAI re­search with a net pos­i­tive im­pact may be difficult to do.

Model 1. The Friendli­ness and Leak­age Ra­tios for an FAI Project

The Friendli­ness Ratio

Let’s imag­ine that there is nec­es­sary amount of re­search to build an AGI, Gre­main­ing. There is also some nec­es­sary amount of re­search to build a FAI, Fre­main­ing. Th­ese two have units of rgen­eral (gen­eral AI re­search) and rfriendly (friendly AI re­search), which are not pre­cisely defined but are di­rectly com­pa­rable.

Which thresh­old is higher? Ac­cord­ing to much of the re­search in this field, Fre­main­ing. We need sig­nifi­cantly more re­search to cre­ate a friendly AI than an un­friendly one.

Figure 1. Ex­am­ple re­search thresh­olds for AGI and FAI.

To un­der­stand the re­la­tion­ship be­tween these thresh­olds, we use the fol­low­ing equa­tion.

We call this the friendli­ness ra­tio. The friendli­ness ra­tio is use­ful for a high level un­der­stand­ing of world to­tal FAI re­search re­quire­ments and is a heuris­tic guide for how difficult the prob­lem of differ­en­tial tech­nolog­i­cal de­vel­op­ment is.

The friendli­ness ra­tio would be high if Fre­main­ing > Gre­main­ing. For ex­am­ple, if there are 2000 units of re­main­ing re­search for an FAI and 20 units for an AGI, the friendli­ness ra­tio would be 100. If some­one pub­lished re­search with 20 units of FAI re­search but 1 unit of AI re­search, their re­search would not meet the friendli­ness ra­tio re­quire­ment (100 vs 201) and would thus make the prob­lem even worse.

The Leak­age Ratio

For spe­cific pro­jects it may be use­ful to have a mea­sure that fo­cuses di­rectly on the nega­tive out­come.

For this we can use the leak­age ra­tio, which rep­re­sents the amount of un­de­sired AGI re­search cre­ated per unit of FAI re­search. It is sim­ply the in­verse of the friendli­ness ra­tio.

In or­der for a pro­ject to be net benefi­cial,

Es­ti­mat­ing if a Pro­ject is Net Friendli­ness Positive

Ques­tion: How can one es­ti­mate if a pro­ject is net friendli­ness-pos­i­tive?

A naive an­swer would be to make sure that it falls over the global friendli­ness ra­tio or un­der the global leak­age ra­tio.

Global AI re­search rates need to fulfill the friendli­ness ra­tio in or­der to pro­duce a FAI. There­fore, if an ad­vance in friendli­ness re­search gets pro­duced with FAI re­search Fpro­ject, but in the pro­cess it also pro­duces AGI re­search Gpro­ject, then this would be net friendli­ness nega­tive if

Later re­search would need to make up for this un­der-bal­ance.

AI Re­search Example

Say that Gre­main­ing = 200rg and Fre­main­ing =2000rf, lead­ing to a friendli­ness ra­tio of fglobal = 10 and a global max­i­mum leak­age ra­tio of lglobal = 0.1. In this case, spe­cific re­search pro­jects could be eval­u­ated to make sure that they meet this thresh­old. One could imag­ine an or­ga­ni­za­tion de­cid­ing what re­search to do or pub­lish us­ing the fol­low­ing chart.

De­scrip­tion AGI
6 0.17
Math Paper 11 0.09
14 0.07

In this case, only Pro­jects 2 and 3 have a leak­age ra­tio of less than 0.1, mean­ing that only these would net benefi­cial. Even though Pro­ject 1 has gen­er­ated safety re­search, it would be net nega­tive.

Model 1 As­sump­tions:

1. There ex­ists some thresh­old Gre­main­ing of re­search nec­es­sary to gen­er­ate an un­friendly ar­tifi­cial in­tel­li­gence.

2. There ex­ists some thresh­old Fre­main­ing of re­search nec­es­sary to gen­er­ate a friendly ar­tifi­cial in­tel­li­gence.

3. If Gre­main­ing is reached be­fore Fre­main­ing, a UFAI will be cre­ated. If af­ter, an FAI will be cre­ated.

Model 2. AGI Leak­age Limits of an FAI Venture

Ques­tion: How can an FAI ven­ture en­sure the cre­ation of an FAI?

Let’s imag­ine a group that plans to en­sure that an FAI is cre­ated. We call this an FAI Ven­ture.

This ven­ture would be con­strained by time. AGI re­search is be­ing cre­ated in­ter­na­tion­ally and, if left alone, will likely cre­ate an UFAI. We can con­sider re­search done out­side of the ven­ture as ex­ter­nal re­search and re­search within the ven­ture as in­ter­nal re­search. If in­ter­nal re­search is done too slowly, or if it leaks too much AGI re­search, an un­friendly ar­tifi­cial in­tel­li­gence could be cre­ated be­fore Fre­main­ing is met.

We thus split up friendly and un­friendly re­search cre­ation into two cat­e­gories, ex­ter­nal and in­ter­nal re­search. Then we con­sider the deriva­tive of each with re­spect to time. For sim­plic­ity, we as­sume the unit of time is years.

G’i = AGI re­search pro­duced in­ter­nally per year

F’i = FAI re­search pro­duced in­ter­nally per year

G’e = AGI re­search pro­duced ex­ter­nally per year

F’e = FAI re­search pro­duced ex­ter­nally per year

We can un­der­stand that there ex­ists times, tf and tg, which are the times at which the friendly and gen­eral re­main­ing thresh­olds are met.

tf = Year in which Fre­main­ing is met

tg = Year in which Gre­main­ing is met

Th­ese times can be es­ti­mated as fol­lows:

The ven­ture wants to make sure that tf < tg so that the even­tual AI is friendly (as­sump­tion 3). With this, we find that:

Where the val­ues of C0 and C1 both in­clude the friendli­ness ra­tio .

This im­plies a lin­ear re­la­tion­ship be­tween F’i and G’i. The more FAI re­search the FAI ven­ture can pro­duce, the more AGI re­search it is al­lowed to leak.

This gives us a clean way to go from a G’i value the ven­ture could ex­pect to the F’i it would need to be suc­cess­ful.

The C0 value de­scribes the ab­solute min­i­mum amount of FAI re­search nec­es­sary in or­der to have a chance at a suc­cess­ful out­come. While the re­sult­ing ac­cept­able leak­age ra­tio at this point would be im­pos­si­ble to meet, the baseline is easy to calcu­late. As­sum­ing that F’e << fglobalG’e, we can es­ti­mate that

If we wanted to in­stead calcu­late G’i us­ing F’i, we could use the fol­low­ing equa­tion. This may be more di­rect to the in­ten­tions of a ven­ture (find­ing the ac­cept­able amount of AGI leak­age af­ter es­ti­mat­ing FAI pro­duc­tivity).

Model 2 Example

For ex­am­ple, let’s imag­ine that the and . In this case, . This means that if the ven­ture could make sure to leak ex­actly , it would need to av­er­age a FAI re­search rate of 10 times that of the en­tire world’s out­put of AGI re­search. This amount in­creases as 100 /​ (1 − 10 * lpro­ject). If the ven­ture ex­pects an es­ti­mated leak­age ra­tio of 0.05, they would need to dou­ble their re­search out­put to , or 20 times global AGI out­put.

Figure 2. F’i per unit of max­i­mum per­mis­si­ble G’i

What to do?

The num­bers in the ex­am­ple above are a bit de­press­ing. There is so much global AI re­search that it seems difficult to imag­ine the world av­er­ag­ing an even higher rate of FAI re­search, which would be nec­es­sary if the friendli­ness ra­tio is greater than 1.

There are some up­sides. First, much hard AI work is done pri­vately in tech­nol­ogy com­pa­nies with­out be­ing pub­lished, limit­ing G’i. Se­cond, the num­bers of rg and rf don’t perfectly cor­re­late with the difficulty to reach them. It may be that we have diminish­ing marginal re­turns with our cur­rent lev­els of rg, so similar lev­els of rf will be eas­ier to reach.

It’s pos­si­ble that Fre­main­ing may be sur­pris­ingly low or that Gre­main­ing may be sur­pris­ingly high.

Pro­jects with high leak­age ra­tios don’t have to be com­pletely avoided or hid­den. The G’i value is speci­fi­cally for re­search that will be in the hands of the group that even­tu­ally cre­ates a AGI, so it would make sense that FAI re­search or­ga­ni­za­tions could share high risk in­for­ma­tion be­tween each other as long as it doesn’t leak ex­ter­nally. The FAI ven­ture men­tioned above could be viewed as a col­lec­tion of or­ga­ni­za­tions rather than one spe­cific one. It may even be difficult for AGI re­search im­pli­ca­tions to move ex­ter­nally, if the FAI aca­demic liter­a­ture is sig­nifi­cantly sep­a­rated from AGI aca­demic liter­a­ture. This logic pro­vides a heuris­tic guide to choos­ing re­search pro­jects, choos­ing if to pub­lish re­search already done, and man­ag­ing con­cen­tra­tions of in­for­ma­tion.

Model 2 As­sump­tions:

1-3. The same 3 as­sump­tions for the pre­vi­ous model.

4. The rates of re­search cre­ation will be fairly con­stant.

5. Ex­ter­nal and in­ter­nal rates of re­search do not in­fluence each other.


The friendli­ness ra­tio pro­vides a high-level un­der­stand­ing of the amount of global FAI re­search per unit AGI re­search needed to cre­ate an FAI. The leak­age ra­tio is the in­verse of the friendli­ness ra­tio ap­plied to a spe­cific FAI pro­ject, to spec­ify if that spe­cific pro­ject is net friendli­ness pos­i­tive. Th­ese can be used to un­der­stand the challenge for AGI re­search and tell if a par­tic­u­lar pro­ject is net benefi­cial or net harm­ful.

To un­der­stand the challenges fac­ing an FAI Ven­ture, we found the sim­ple equation


This pa­per was fo­cused on es­tab­lish­ing the men­tioned mod­els in­stead of es­ti­mat­ing in­put val­ues. If the mod­els are con­sid­ered use­ful, there should be more re­search to es­ti­mate these num­bers. The mod­els could also be im­proved to in­cor­po­rate un­cer­tainty, the grow­ing re­turns of re­search, and other im­por­tant limi­ta­tions that we haven’t con­sid­ered. Fi­nally, the friendli­ness ra­tio con­cept nat­u­rally gen­er­al­izes to other tech­nol­ogy in­duced ex­is­ten­tial risks.


a. Math ma­nipu­la­tion for Model 2

This last equa­tion can be writ­ten as


Re­call­ing the friendli­ness ra­tio, , we can sim­plify these con­structs fur­ther.


[1] What is AGI? https://​​in­tel­li­gence.org/​​2013/​​08/​​11/​​what-is-agi/​​, 2013, Luke Muehlhauser

[2] In­tel­li­gence Ex­plo­sion FAQ, (https://​​in­tel­li­gence.org/​​ie-faq/​​), MIRI

[3] Ar­tifi­cial In­tel­li­gence as a Pos­i­tive and Nega­tive Fac­tor in Global Risk, 2008, Global Catas­trophic Risks, Yudkowsky

[4] Align­ing Su­per­in­tel­li­gence with Hu­man In­ter­est: A Tech­ni­cal Re­search Agenda, https://​​in­tel­li­gence.org/​​files/​​Tech­ni­calA­genda.pdf, Nate Soares and Benja Fel­len­stein, MIRI

[5] Su­per­in­tel­li­gence, 2014, Nick Bostrom

[6] The Challen­geof Friendly AI, https//​www.youtube.com/​watch?v=nkB1e-JCgmY&noredi­rect=1 Yud­kowsky, 2007