Thoughts on the Singularity Institute (SI)

This post pre­sents thoughts on the Sin­gu­lar­ity In­sti­tute from Holden Karnofsky, Co-Ex­ec­u­tive Direc­tor of GiveWell. Note: Luke Muehlhauser, the Ex­ec­u­tive Direc­tor of the Sin­gu­lar­ity In­sti­tute, re­viewed a draft of this post, and com­mented: “I do gen­er­ally agree that your com­plaints are ei­ther cor­rect (es­pe­cially re: past or­ga­ni­za­tional com­pe­tence) or in­cor­rect but not ad­dressed by SI in clear ar­gu­men­ta­tive writ­ing (this in­cludes the part on ‘tool’ AI). I am work­ing to ad­dress both cat­e­gories of is­sues.” I take Luke’s com­ment to be a sig­nifi­cant mark in SI’s fa­vor, be­cause it in­di­cates an ex­plicit recog­ni­tion of the prob­lems I raise, and thus in­creases my es­ti­mate of the like­li­hood that SI will work to ad­dress them.

Septem­ber 2012 up­date: re­sponses have been posted by Luke and Eliezer (and I have re­sponded in the com­ments of their posts). I have also added ac­knowl­edge­ments.

The Sin­gu­lar­ity In­sti­tute (SI) is a char­ity that GiveWell has been re­peat­edly asked to eval­u­ate. In the past, SI has been out­side our scope (as we were fo­cused on spe­cific ar­eas such as in­ter­na­tional aid). With GiveWell Labs we are open to any giv­ing op­por­tu­nity, no mat­ter what form and what sec­tor, but we still do not cur­rently plan to recom­mend SI; given the amount of in­ter­est some of our au­di­ence has ex­pressed, I feel it is im­por­tant to ex­plain why. Our views, of course, re­main open to change. (Note: I am post­ing this only to Less Wrong, not to the GiveWell Blog, be­cause I be­lieve that ev­ery­one who would be in­ter­ested in this post will see it here.)

I am cur­rently the GiveWell staff mem­ber who has put the most time and effort into en­gag­ing with and eval­u­at­ing SI. Other GiveWell staff cur­rently agree with my bot­tom-line view that we should not recom­mend SI, but this does not mean they have en­gaged with each of my spe­cific ar­gu­ments. There­fore, while the lack of recom­men­da­tion of SI is some­thing that GiveWell stands be­hind, the spe­cific ar­gu­ments in this post should be at­tributed only to me, not to GiveWell.

Sum­mary of my views

  • The ar­gu­ment ad­vanced by SI for why the work it’s do­ing is benefi­cial and im­por­tant seems both wrong and poorly ar­gued to me. My sense at the mo­ment is that the ar­gu­ments SI is mak­ing would, if ac­cepted, in­crease rather than de­crease the risk of an AI-re­lated catas­tro­phe. More

  • SI has, or has had, mul­ti­ple prop­er­ties that I as­so­ci­ate with in­effec­tive or­ga­ni­za­tions, and I do not see any spe­cific ev­i­dence that its per­son­nel/​or­ga­ni­za­tion are well-suited to the tasks it has set for it­self. More

  • A com­mon ar­gu­ment for giv­ing to SI is that “even an in­finites­i­mal chance that it is right” would be suffi­cient given the stakes. I have writ­ten pre­vi­ously about why I re­ject this rea­son­ing; in ad­di­tion, promi­nent SI rep­re­sen­ta­tives seem to re­ject this par­tic­u­lar ar­gu­ment as well (i.e., they be­lieve that one should sup­port SI only if one be­lieves it is a strong or­ga­ni­za­tion mak­ing strong ar­gu­ments). More

  • My sense is that at this point, given SI’s cur­rent fi­nan­cial state, with­hold­ing funds from SI is likely bet­ter for its mis­sion than donat­ing to it. (I would not take this view to the fur­thest ex­treme; the ar­gu­ment that SI should have some fund­ing seems stronger to me than the ar­gu­ment that it should have as much as it cur­rently has.)

  • I find ex­is­ten­tial risk re­duc­tion to be a fairly promis­ing area for philan­thropy, and plan to in­ves­ti­gate it fur­ther. More

  • There are many things that could hap­pen that would cause me to re­vise my view on SI. How­ever, I do not plan to re­spond to all com­ment re­sponses to this post. (Given the vol­ume of re­sponses we may re­ceive, I may not be able to even read all the com­ments on this post.) I do not be­lieve these two state­ments are in­con­sis­tent, and I lay out paths for get­ting me to change my mind that are likely to work bet­ter than post­ing com­ments. (Of course I en­courage peo­ple to post com­ments; I’m just not­ing in ad­vance that this ac­tion, alone, doesn’t guaran­tee that I will con­sider your ar­gu­ment.) More

In­tent of this post

I did not write this post with the pur­pose of “hurt­ing” SI. Rather, I wrote it in the hopes that one of these three things (or some com­bi­na­tion) will hap­pen:

  1. New ar­gu­ments are raised that cause me to change my mind and rec­og­nize SI as an out­stand­ing giv­ing op­por­tu­nity. If this hap­pens I will likely at­tempt to raise more money for SI (most likely by dis­cussing it with other GiveWell staff and col­lec­tively con­sid­er­ing a GiveWell Labs recom­men­da­tion).

  2. SI con­cedes that my ob­jec­tions are valid and in­creases its de­ter­mi­na­tion to ad­dress them. A few years from now, SI is a bet­ter or­ga­ni­za­tion and more effec­tive in its mis­sion.

  3. SI can’t or won’t make changes, and SI’s sup­port­ers feel my ob­jec­tions are valid, so SI loses some sup­port, free­ing up re­sources for other ap­proaches to do­ing good.

Which one of these oc­curs will hope­fully be driven pri­mar­ily by the mer­its of the differ­ent ar­gu­ments raised. Be­cause of this, I think that what­ever hap­pens as a re­sult of my post will be pos­i­tive for SI’s mis­sion, whether or not it is pos­i­tive for SI as an or­ga­ni­za­tion. I be­lieve that most of SI’s sup­port­ers and ad­vo­cates care more about the former than about the lat­ter, and that this at­ti­tude is far too rare in the non­profit world.

Does SI have a well-ar­gued case that its work is benefi­cial and im­por­tant?

I know no more con­cise sum­mary of SI’s views than this page, so here I give my own im­pres­sions of what SI be­lieves, in ital­ics.

  1. There is some chance that in the near fu­ture (next 20-100 years), an “ar­tifi­cial gen­eral in­tel­li­gence” (AGI) - a com­puter that is vastly more in­tel­li­gent than hu­mans in ev­ery rele­vant way—will be cre­ated.

  2. This AGI will likely have a util­ity func­tion and will seek to max­i­mize util­ity ac­cord­ing to this func­tion.

  3. This AGI will be so much more pow­er­ful than hu­mans—due to its su­pe­rior in­tel­li­gence—that it will be able to re­shape the world to max­i­mize its util­ity, and hu­mans will not be able to stop it from do­ing so.

  4. There­fore, it is cru­cial that its util­ity func­tion be one that is rea­son­ably har­mo­nious with what hu­mans want. A “Friendly” util­ity func­tion is one that is rea­son­ably har­mo­nious with what hu­mans want, such that a “Friendly” AGI (FAI) would change the world for the bet­ter (by hu­man stan­dards) while an “Un­friendly” AGI (UFAI) would es­sen­tially wipe out hu­man­ity (or worse).

  5. Un­less great care is taken speci­fi­cally to make a util­ity func­tion “Friendly,” it will be “Un­friendly,” since the things hu­mans value are a tiny sub­set of the things that are pos­si­ble.

  6. There­fore, it is cru­cially im­por­tant to de­velop “Friendli­ness the­ory” that helps us to en­sure that the first strong AGI’s util­ity func­tion will be “Friendly.” The de­vel­oper of Friendli­ness the­ory could use it to build an FAI di­rectly or could dis­sem­i­nate the the­ory so that oth­ers work­ing on AGI are more likely to build FAI as op­posed to UFAI.

From the time I first heard this ar­gu­ment, it has seemed to me to be skip­ping im­por­tant steps and mak­ing ma­jor un­jus­tified as­sump­tions. How­ever, for a long time I be­lieved this could eas­ily be due to my in­fe­rior un­der­stand­ing of the rele­vant is­sues. I be­lieved my own views on the ar­gu­ment to have only very low rele­vance (as I stated in my 2011 in­ter­view with SI rep­re­sen­ta­tives). Over time, I have had many dis­cus­sions with SI sup­port­ers and ad­vo­cates, as well as with non-sup­port­ers who I be­lieve un­der­stand the rele­vant is­sues well. I now be­lieve—for the mo­ment—that my ob­jec­tions are highly rele­vant, that they can­not be dis­missed as sim­ple “lay­man’s mi­s­un­der­stand­ings” (as they have been by var­i­ous SI sup­port­ers in the past), and that SI has not pub­lished any­thing that ad­dresses them in a clear way.

Below, I list my ma­jor ob­jec­tions. I do not be­lieve that these ob­jec­tions con­sti­tute a sharp/​tight case for the idea that SI’s work has low/​nega­tive value; I be­lieve, in­stead, that SI’s own ar­gu­ments are too vague for such a re­but­tal to be pos­si­ble. There are many pos­si­ble re­sponses to my ob­jec­tions, but SI’s pub­lic ar­gu­ments (and the pri­vate ar­gu­ments) do not make clear which pos­si­ble re­sponse (if any) SI would choose to take up and defend. Hope­fully the di­alogue fol­low­ing this post will clar­ify what SI be­lieves and why.

Some of my views are dis­cussed at greater length (though with less clar­ity) in a pub­lic tran­script of a con­ver­sa­tion I had with SI sup­porter Jaan Tal­linn. I re­fer to this tran­script as “Karnofsky/​Tal­linn 2011.”

Ob­jec­tion 1: it seems to me that any AGI that was set to max­i­mize a “Friendly” util­ity func­tion would be ex­traor­di­nar­ily dan­ger­ous.

Sup­pose, for the sake of ar­gu­ment, that SI man­ages to cre­ate what it be­lieves to be an FAI. Sup­pose that it is suc­cess­ful in the “AGI” part of its goal, i.e., it has suc­cess­fully cre­ated an in­tel­li­gence vastly su­pe­rior to hu­man in­tel­li­gence and ex­traor­di­nar­ily pow­er­ful from our per­spec­tive. Sup­pose that it has also done its best on the “Friendly” part of the goal: it has de­vel­oped a for­mal ar­gu­ment for why its AGI’s util­ity func­tion will be Friendly, it be­lieves this ar­gu­ment to be air­tight, and it has had this ar­gu­ment checked over by 100 of the world’s most in­tel­li­gent and rele­vantly ex­pe­rienced peo­ple. Sup­pose that SI now ac­ti­vates its AGI, un­leash­ing it to re­shape the world as it sees fit. What will be the out­come?

I be­lieve that the prob­a­bil­ity of an un­fa­vor­able out­come—by which I mean an out­come es­sen­tially equiv­a­lent to what a UFAI would bring about—ex­ceeds 90% in such a sce­nario. I be­lieve the goal of de­sign­ing a “Friendly” util­ity func­tion is likely to be be­yond the abil­ities even of the best team of hu­mans will­ing to de­sign such a func­tion. I do not have a tight ar­gu­ment for why I be­lieve this, but a com­ment on LessWrong by Wei Dai gives a good illus­tra­tion of the kind of thoughts I have on the mat­ter:

What I’m afraid of is that a de­sign will be shown to be safe, and then it turns out that the proof is wrong, or the for­mal­iza­tion of the no­tion of “safety” used by the proof is wrong. This kind of thing hap­pens a lot in cryp­tog­ra­phy, if you re­place “safety” with “se­cu­rity”. Th­ese mis­takes are still oc­cur­ring to­day, even af­ter decades of re­search into how to do such proofs and what the rele­vant for­mal­iza­tions are. From where I’m sit­ting, prov­ing an AGI de­sign Friendly seems even more difficult and er­ror-prone than prov­ing a crypto scheme se­cure, prob­a­bly by a large mar­gin, and there is no decades of time to re­fine the proof tech­niques and for­mal­iza­tions. There’s good re­cent re­view of the his­tory of prov­able se­cu­rity, ti­tled Prov­able Se­cu­rity in the Real World, which might help you un­der­stand where I’m com­ing from.

I think this com­ment un­der­states the risks, how­ever. For ex­am­ple, when the com­ment says “the for­mal­iza­tion of the no­tion of ‘safety’ used by the proof is wrong,” it is not clear whether it means that the val­ues the pro­gram­mers have in mind are not cor­rectly im­ple­mented by the for­mal­iza­tion, or whether it means they are cor­rectly im­ple­mented but are them­selves catas­trophic in a way that hasn’t been an­ti­ci­pated. I would be highly con­cerned about both. There are other catas­trophic pos­si­bil­ities as well; per­haps the util­ity func­tion it­self is well-speci­fied and safe, but the AGI’s model of the world is flawed (in par­tic­u­lar, per­haps its prior or its pro­cess for match­ing ob­ser­va­tions to pre­dic­tions are flawed) in a way that doesn’t emerge un­til the AGI has made sub­stan­tial changes to its en­vi­ron­ment.

By SI’s own ar­gu­ments, even a small er­ror in any of these things would likely lead to catas­tro­phe. And there are likely failure forms I haven’t thought of. The over­rid­ing in­tu­ition here is that com­plex plans usu­ally fail when un­ac­com­panied by feed­back loops. A sce­nario in which a set of peo­ple is ready to un­leash an all-pow­er­ful be­ing to max­i­mize some pa­ram­e­ter in the world, based solely on their ini­tial con­fi­dence in their own ex­trap­o­la­tions of the con­se­quences of do­ing so, seems like a sce­nario that is over­whelm­ingly likely to re­sult in a bad out­come. It comes down to plac­ing the world’s largest bet on a highly com­plex the­ory—with no ex­per­i­men­ta­tion to test the the­ory first.

So far, all I have ar­gued is that the de­vel­op­ment of “Friendli­ness” the­ory can achieve at best only a limited re­duc­tion in the prob­a­bil­ity of an un­fa­vor­able out­come. How­ever, as I ar­gue in the next sec­tion, I be­lieve there is at least one con­cept—the “tool-agent” dis­tinc­tion—that has more po­ten­tial to re­duce risks, and that SI ap­pears to ig­nore this con­cept en­tirely. I be­lieve that tools are safer than agents (even agents that make use of the best “Friendli­ness” the­ory that can rea­son­ably be hoped for) and that SI en­courages a fo­cus on build­ing agents, thus in­creas­ing risk.

Ob­jec­tion 2: SI ap­pears to ne­glect the po­ten­tially im­por­tant dis­tinc­tion be­tween “tool” and “agent” AI.

Google Maps is a type of ar­tifi­cial in­tel­li­gence (AI). It is far more in­tel­li­gent than I am when it comes to plan­ning routes.

Google Maps—by which I mean the com­plete soft­ware pack­age in­clud­ing the dis­play of the map it­self—does not have a “util­ity” that it seeks to max­i­mize. (One could fit a util­ity func­tion to its ac­tions, as to any set of ac­tions, but there is no sin­gle “pa­ram­e­ter to be max­i­mized” driv­ing its op­er­a­tions.)

Google Maps (as I un­der­stand it) con­sid­ers mul­ti­ple pos­si­ble routes, gives each a score based on fac­tors such as dis­tance and likely traf­fic, and then dis­plays the best-scor­ing route in a way that makes it eas­ily un­der­stood by the user. If I don’t like the route, for what­ever rea­son, I can change some pa­ram­e­ters and con­sider a differ­ent route. If I like the route, I can print it out or email it to a friend or send it to my phone’s nav­i­ga­tion ap­pli­ca­tion. Google Maps has no sin­gle pa­ram­e­ter it is try­ing to max­i­mize; it has no rea­son to try to “trick” me in or­der to in­crease its util­ity.

In short, Google Maps is not an agent, tak­ing ac­tions in or­der to max­i­mize a util­ity pa­ram­e­ter. It is a tool, gen­er­at­ing in­for­ma­tion and then dis­play­ing it in a user-friendly man­ner for me to con­sider, use and ex­port or dis­card as I wish.

Every soft­ware ap­pli­ca­tion I know of seems to work es­sen­tially the same way, in­clud­ing those that in­volve (spe­cial­ized) ar­tifi­cial in­tel­li­gence such as Google Search, Siri, Wat­son, Ry­bka, etc. Some can be put into an “agent mode” (as Wat­son was on Jeop­ardy!) but all can eas­ily be set up to be used as “tools” (for ex­am­ple, Wat­son can sim­ply dis­play its top can­di­date an­swers to a ques­tion, with the score for each, with­out speak­ing any of them.)

The “tool mode” con­cept is im­por­tantly differ­ent from the pos­si­bil­ity of Or­a­cle AI some­times dis­cussed by SI. The dis­cus­sions I’ve seen of Or­a­cle AI pre­sent it as an Un­friendly AI that is “trapped in a box”—an AI whose in­tel­li­gence is driven by an ex­plicit util­ity func­tion and that hu­mans hope to con­trol co­er­cively. Hence the dis­cus­sion of ideas such as the AI-Box Ex­per­i­ment. A differ­ent in­ter­pre­ta­tion, given in Karnofsky/​Tal­linn 2011, is an AI with a care­fully de­signed util­ity func­tion—likely as difficult to con­struct as “Friendli­ness”—that leaves it “wish­ing” to an­swer ques­tions helpfully. By con­trast with both these ideas, Tool-AGI is not “trapped” and it is not Un­friendly or Friendly; it has no mo­ti­va­tions and no driv­ing util­ity func­tion of any kind, just like Google Maps. It scores differ­ent pos­si­bil­ities and dis­plays its con­clu­sions in a trans­par­ent and user-friendly man­ner, as its in­struc­tions say to do; it does not have an over­ar­ch­ing “want,” and so, as with the spe­cial­ized AIs de­scribed above, while it may some­times “mis­in­ter­pret” a ques­tion (thereby scor­ing op­tions poorly and rank­ing the wrong one #1) there is no rea­son to ex­pect in­ten­tional trick­ery or ma­nipu­la­tion when it comes to dis­play­ing its re­sults.

Another way of putting this is that a “tool” has an un­der­ly­ing in­struc­tion set that con­cep­tu­ally looks like: “(1) Calcu­late which ac­tion A would max­i­mize pa­ram­e­ter P, based on ex­ist­ing data set D. (2) Sum­ma­rize this calcu­la­tion in a user-friendly man­ner, in­clud­ing what Ac­tion A is, what likely in­ter­me­di­ate out­comes it would cause, what other ac­tions would re­sult in high val­ues of P, etc.” An “agent,” by con­trast, has an un­der­ly­ing in­struc­tion set that con­cep­tu­ally looks like: “(1) Calcu­late which ac­tion, A, would max­i­mize pa­ram­e­ter P, based on ex­ist­ing data set D. (2) Ex­e­cute Ac­tion A.” In any AI where (1) is sep­a­rable (by the pro­gram­mers) as a dis­tinct step, (2) can be set to the “tool” ver­sion rather than the “agent” ver­sion, and this sep­a­ra­bil­ity is in fact pre­sent with most/​all mod­ern soft­ware. Note that in the “tool” ver­sion, nei­ther step (1) nor step (2) (nor the com­bi­na­tion) con­sti­tutes an in­struc­tion to max­i­mize a pa­ram­e­ter—to de­scribe a pro­gram of this kind as “want­ing” some­thing is a cat­e­gory er­ror, and there is no rea­son to ex­pect its step (2) to be de­cep­tive.

I elab­o­rated fur­ther on the dis­tinc­tion and on the con­cept of a tool-AI in Karnofsky/​Tal­linn 2011.

This is im­por­tant be­cause an AGI run­ning in tool mode could be ex­traor­di­nar­ily use­ful but far more safe than an AGI run­ning in agent mode. In fact, if de­vel­op­ing “Friendly AI” is what we seek, a tool-AGI could likely be helpful enough in think­ing through this prob­lem as to ren­der any pre­vi­ous work on “Friendli­ness the­ory” moot. Among other things, a tool-AGI would al­low trans­par­ent views into the AGI’s rea­son­ing and pre­dic­tions with­out any rea­son to fear be­ing pur­pose­fully mis­led, and would fa­cil­i­tate safe ex­per­i­men­tal test­ing of any util­ity func­tion that one wished to even­tu­ally plug into an “agent.”

Is a tool-AGI pos­si­ble? I be­lieve that it is, and fur­ther­more that it ought to be our de­fault pic­ture of how AGI will work, given that prac­ti­cally all soft­ware de­vel­oped to date can (and usu­ally does) run as a tool and given that mod­ern soft­ware seems to be con­stantly be­com­ing “in­tel­li­gent” (ca­pa­ble of giv­ing bet­ter an­swers than a hu­man) in sur­pris­ing new do­mains. In ad­di­tion, it in­tu­itively seems to me (though I am not highly con­fi­dent) that in­tel­li­gence in­her­ently in­volves the dis­tinct, sep­a­rable steps of (a) con­sid­er­ing mul­ti­ple pos­si­ble ac­tions and (b) as­sign­ing a score to each, prior to ex­e­cut­ing any of the pos­si­ble ac­tions. If one can dis­tinctly sep­a­rate (a) and (b) in a pro­gram’s code, then one can ab­stain from writ­ing any “ex­e­cu­tion” in­struc­tions and in­stead fo­cus on mak­ing the pro­gram list ac­tions and scores in a user-friendly man­ner, for hu­mans to con­sider and use as they wish.

Of course, there are pos­si­ble paths to AGI that may rule out a “tool mode,” but it seems that most of these paths would rule out the ap­pli­ca­tion of “Friendli­ness the­ory” as well. (For ex­am­ple, a “black box” em­u­la­tion and aug­men­ta­tion of a hu­man mind.) What are the paths to AGI that al­low man­ual, trans­par­ent, in­ten­tional de­sign of a util­ity func­tion but do not al­low the re­place­ment of “ex­e­cu­tion” in­struc­tions with “com­mu­ni­ca­tion” in­struc­tions? Most of the con­ver­sa­tions I’ve had on this topic have fo­cused on three re­sponses:

  • Self-im­prov­ing AI. Many seem to find it in­tu­itive that (a) AGI will al­most cer­tainly come from an AI rewrit­ing its own source code, and (b) such a pro­cess would in­evitably lead to an “agent.” I do not agree with ei­ther (a) or (b). I dis­cussed these is­sues in Karnofsky/​Tal­linn 2011 and will be happy to dis­cuss them more if this is the line of re­sponse that SI ends up pur­su­ing. Very briefly:

    • The idea of a “self-im­prov­ing al­gorithm” in­tu­itively sounds very pow­er­ful, but does not seem to have led to many “ex­plo­sions” in soft­ware so far (and it seems to be a con­cept that could ap­ply to nar­row AI as well as to AGI).

    • It seems to me that a tool-AGI could be plugged into a self-im­prove­ment pro­cess that would be quite pow­er­ful but would also ter­mi­nate and yield a new tool-AI af­ter a set num­ber of iter­a­tions (or af­ter reach­ing a set “in­tel­li­gence thresh­old”). So I do not ac­cept the ar­gu­ment that “self-im­prov­ing AGI means agent AGI.” As stated above, I will elab­o­rate on this view if it turns out to be an im­por­tant point of dis­agree­ment.

    • I have ar­gued (in Karnofsky/​Tal­linn 2011) that the rele­vant self-im­prove­ment abil­ities are likely to come with or af­ter—not prior to—the de­vel­op­ment of strong AGI. In other words, any soft­ware ca­pa­ble of the rele­vant kind of self-im­prove­ment is likely also ca­pa­ble of be­ing used as a strong tool-AGI, with the benefits de­scribed above.

    • The SI-re­lated dis­cus­sions I’ve seen of “self-im­prov­ing AI” are highly vague, and do not spell out views on the above points.

  • Danger­ous data col­lec­tion. Some point to the seem­ing dan­gers of a tool-AI’s “scor­ing” func­tion: in or­der to score differ­ent op­tions it may have to col­lect data, which is it­self an “agent” type ac­tion that could lead to dan­ger­ous ac­tions. I think my defi­ni­tion of “tool” above makes clear what is wrong with this ob­jec­tion: a tool-AGI takes its ex­ist­ing data set D as fixed (and per­haps could have some pre-de­ter­mined, safe set of sim­ple ac­tions it can take—such as us­ing Google’s API—to col­lect more), and if max­i­miz­ing its cho­sen pa­ram­e­ter is best ac­com­plished through more data col­lec­tion, it can trans­par­ently out­put why and how it sug­gests col­lect­ing more data. Over time it can be given more au­ton­omy for data col­lec­tion through an ex­per­i­men­tal and do­main-spe­cific pro­cess (e.g., mod­ify­ing the AI to skip spe­cific steps of hu­man re­view of pro­pos­als for data col­lec­tion af­ter it has be­come clear that these steps work as in­tended), a pro­cess that has lit­tle to do with the “Friendly over­ar­ch­ing util­ity func­tion” con­cept pro­moted by SI. Again, I will elab­o­rate on this if it turns out to be a key point.

  • Race for power. Some have ar­gued to me that hu­mans are likely to choose to cre­ate agent-AGI, in or­der to quickly gain power and out­race other teams work­ing on AGI. But this ar­gu­ment, even if ac­cepted, has very differ­ent im­pli­ca­tions from SI’s view.

    Con­ven­tional wis­dom says it is ex­tremely dan­ger­ous to em­power a com­puter to act in the world un­til one is very sure that the com­puter will do its job in a way that is helpful rather than harm­ful. So if a pro­gram­mer chooses to “un­leash an AGI as an agent” with the hope of gain­ing power, it seems that this pro­gram­mer will be de­liber­ately ig­nor­ing con­ven­tional wis­dom about what is safe in fa­vor of short­sighted greed. I do not see why such a pro­gram­mer would be ex­pected to make use of any “Friendli­ness the­ory” that might be available. (At­tempt­ing to in­cor­po­rate such the­ory would al­most cer­tainly slow the pro­ject down greatly, and thus would bring the same prob­lems as the more gen­eral “have cau­tion, do test­ing” coun­seled by con­ven­tional wis­dom.) It seems that the ap­pro­pri­ate mea­sures for pre­vent­ing such a risk are se­cu­rity mea­sures aiming to stop hu­mans from launch­ing un­safe agent-AIs, rather than de­vel­op­ing the­o­ries or rais­ing aware­ness of “Friendli­ness.”

One of the things that both­ers me most about SI is that there is prac­ti­cally no pub­lic con­tent, as far as I can tell, ex­plic­itly ad­dress­ing the idea of a “tool” and giv­ing ar­gu­ments for why AGI is likely to work only as an “agent.” The idea that AGI will be driven by a cen­tral util­ity func­tion seems to be sim­ply as­sumed. Two ex­am­ples:

  • I have been referred to Muehlhauser and Sala­mon 2012 as the most up-to-date, clear ex­pla­na­tion of SI’s po­si­tion on “the ba­sics.” This pa­per states, “Per­haps we could build an AI of limited cog­ni­tive abil­ity — say, a ma­chine that only an­swers ques­tions: an ‘Or­a­cle AI.’ But this ap­proach is not with­out its own dan­gers (Arm­strong, Sand­berg, and Bostrom 2012).” How­ever, the refer­enced pa­per (Arm­strong, Sand­berg and Bostrom 2012) seems to take it as a given that an Or­a­cle AI is an “agent trapped in a box”—a com­puter that has a ba­sic drive/​util­ity func­tion, not a Tool-AGI. The rest of Muehlhauser and Sala­mon 2012 seems to take it as a given that an AGI will be an agent.

  • I have of­ten been referred to Omo­hun­dro 2008 for an ar­gu­ment that an AGI is likely to have cer­tain goals. But this pa­per seems, again, to take it as given that an AGI will be an agent, i.e., that it will have goals at all. The in­tro­duc­tion states, “To say that a sys­tem of any de­sign is an ‘ar­tifi­cial in­tel­li­gence’, we mean that it has goals which it tries to ac­com­plish by act­ing in the world.” In other words, the premise I’m dis­put­ing seems em­bed­ded in its very defi­ni­tion of AI.

The clos­est thing I have seen to a pub­lic dis­cus­sion of “tool-AGI” is in Dreams of Friendli­ness, where Eliezer Yud­kowsky con­sid­ers the ques­tion, “Why not just have the AI an­swer ques­tions, in­stead of try­ing to do any­thing? Then it wouldn’t need to be Friendly. It wouldn’t need any goals at all. It would just an­swer ques­tions.” His re­sponse:

To which the re­ply is that the AI needs goals in or­der to de­cide how to think: that is, the AI has to act as a pow­er­ful op­ti­miza­tion pro­cess in or­der to plan its ac­qui­si­tion of knowl­edge, effec­tively dis­till sen­sory in­for­ma­tion, pluck “an­swers” to par­tic­u­lar ques­tions out of the space of all pos­si­ble re­sponses, and of course, to im­prove its own source code up to the level where the AI is a pow­er­ful in­tel­li­gence. All these events are “im­prob­a­ble” rel­a­tive to ran­dom or­ga­ni­za­tions of the AI’s RAM, so the AI has to hit a nar­row tar­get in the space of pos­si­bil­ities to make su­per­in­tel­li­gent an­swers come out.

This pas­sage ap­pears vague and does not ap­pear to ad­dress the spe­cific “tool” con­cept I have defended above (in par­tic­u­lar, it does not ad­dress the anal­ogy to mod­ern soft­ware, which challenges the idea that “pow­er­ful op­ti­miza­tion pro­cesses” can­not run in tool mode). The rest of the piece dis­cusses (a) psy­cholog­i­cal mis­takes that could lead to the dis­cus­sion in ques­tion; (b) the “Or­a­cle AI” con­cept that I have out­lined above. The com­ments con­tain some more dis­cus­sion of the “tool” idea (De­nis Bider and Shane Legg seem to be pic­tur­ing some­thing similar to “tool-AGI”) but the dis­cus­sion is un­re­solved and I be­lieve the “tool” con­cept defended above re­mains es­sen­tially un­ad­dressed.

In sum, SI ap­pears to en­courage a fo­cus on build­ing and launch­ing “Friendly” agents (it is seek­ing to do so it­self, and its work on “Friendli­ness” the­ory seems to be lay­ing the ground­work for oth­ers to do so) while not ad­dress­ing the tool-agent dis­tinc­tion. It seems to as­sume that any AGI will have to be an agent, and to make lit­tle to no at­tempt at jus­tify­ing this as­sump­tion. The re­sult, in my view, is that it is es­sen­tially ad­vo­cat­ing for a more dan­ger­ous ap­proach to AI than the tra­di­tional ap­proach to soft­ware de­vel­op­ment.

Ob­jec­tion 3: SI’s en­vi­sioned sce­nario is far more spe­cific and con­junc­tive than it ap­pears at first glance, and I be­lieve this sce­nario to be highly un­likely.

SI’s sce­nario con­cerns the de­vel­op­ment of ar­tifi­cial gen­eral in­tel­li­gence (AGI): a com­puter that is vastly more in­tel­li­gent than hu­mans in ev­ery rele­vant way. But we already have many com­put­ers that are vastly more in­tel­li­gent than hu­mans in some rele­vant ways, and the do­mains in which spe­cial­ized AIs outdo hu­mans seem to be con­stantly and con­tin­u­ously ex­pand­ing. I feel that the rele­vance of “Friendli­ness the­ory” de­pends heav­ily on the idea of a “dis­crete jump” that seems un­likely and whose like­li­hood does not seem to have been pub­li­cly ar­gued for.

One pos­si­ble sce­nario is that at some point, we de­velop pow­er­ful enough non-AGI tools (par­tic­u­larly spe­cial­ized AIs) that we vastly im­prove our abil­ities to con­sider and pre­pare for the even­tu­al­ity of AGI—to the point where any pre­vi­ous the­ory de­vel­oped on the sub­ject be­comes use­less. Or (to put this more gen­er­ally) non-AGI tools sim­ply change the world so much that it be­comes es­sen­tially un­rec­og­niz­able from the per­spec­tive of to­day—again ren­der­ing any pre­vi­ous “Friendli­ness the­ory” moot. As I said in Karnofsky/​Tal­linn 2011, some of SI’s work “seems a bit like try­ing to de­sign Face­book be­fore the In­ter­net was in use, or even be­fore the com­puter ex­isted.”

Per­haps there will be a dis­crete jump to AGI, but it will be a sort of AGI that ren­ders “Friendli­ness the­ory” moot for a differ­ent rea­son. For ex­am­ple, in the prac­tice of soft­ware de­vel­op­ment, there of­ten does not seem to be an op­er­a­tional dis­tinc­tion be­tween “in­tel­li­gent” and “Friendly.” (For ex­am­ple, my im­pres­sion is that the only method pro­gram­mers had for eval­u­at­ing Wat­son’s “in­tel­li­gence” was to see whether it was com­ing up with the same an­swers that a well-in­formed hu­man would; the only way to eval­u­ate Siri’s “in­tel­li­gence” was to eval­u­ate its helpful­ness to hu­mans.) “In­tel­li­gent” of­ten ends up get­ting defined as “prone to take ac­tions that seem all-around ‘good’ to the pro­gram­mer.” So the con­cept of “Friendli­ness” may end up be­ing nat­u­rally and sub­tly baked in to a suc­cess­ful AGI effort.

The bot­tom line is that we know very lit­tle about the course of fu­ture ar­tifi­cial in­tel­li­gence. I be­lieve that the prob­a­bil­ity that SI’s con­cept of “Friendly” vs. “Un­friendly” goals ends up seem­ing es­sen­tially non­sen­si­cal, ir­rele­vant and/​or unim­por­tant from the stand­point of the rele­vant fu­ture is over 90%.

Other ob­jec­tions to SI’s views

There are other de­bates about the like­li­hood of SI’s work be­ing rele­vant/​helpful; for ex­am­ple,

  • It isn’t clear whether the de­vel­op­ment of AGI is im­mi­nent enough to be rele­vant, or whether other risks to hu­man­ity are closer.

  • It isn’t clear whether AGI would be as pow­er­ful as SI’s views im­ply. (I dis­cussed this briefly in Karnofsky/​Tal­linn 2011.)

  • It isn’t clear whether even an ex­tremely pow­er­ful UFAI would choose to at­tack hu­mans as op­posed to ne­go­ti­at­ing with them. (I find it some­what helpful to analo­gize UFAI-hu­man in­ter­ac­tions to hu­man-mosquito in­ter­ac­tions. Hu­mans are enor­mously more in­tel­li­gent than mosquitoes; hu­mans are good at pre­dict­ing, ma­nipu­lat­ing, and de­stroy­ing mosquitoes; hu­mans do not value mosquitoes’ welfare; hu­mans have other goals that mosquitoes in­terfere with; hu­mans would like to see mosquitoes erad­i­cated at least from cer­tain parts of the planet. Yet hu­mans haven’t ac­com­plished such erad­i­ca­tion, and it is easy to imag­ine sce­nar­ios in which hu­mans would pre­fer hon­est ne­go­ti­a­tion and trade with mosquitoes to any other ar­range­ment, if such ne­go­ti­a­tion and trade were pos­si­ble.)

Un­like the three ob­jec­tions I fo­cus on, these other is­sues have been dis­cussed a fair amount, and if these other is­sues were the only ob­jec­tions to SI’s ar­gu­ments I would find SI’s case to be strong (i.e., I would find its sce­nario likely enough to war­rant in­vest­ment in).


  • I be­lieve the most likely fu­ture sce­nar­ios are the ones we haven’t thought of, and that the most likely fate of the sort of the­ory SI ends up de­vel­op­ing is ir­rele­vance.

  • I be­lieve that un­leash­ing an all-pow­er­ful “agent AGI” (with­out the benefit of ex­per­i­men­ta­tion) would very likely re­sult in a UFAI-like out­come, no mat­ter how care­fully the “agent AGI” was de­signed to be “Friendly.” I see SI as en­courag­ing (and aiming to take) this ap­proach.

  • I be­lieve that the stan­dard ap­proach to de­vel­op­ing soft­ware re­sults in “tools,” not “agents,” and that tools (while dan­ger­ous) are much safer than agents. A “tool mode” could fa­cil­i­tate ex­per­i­ment-in­formed progress to­ward a safe “agent,” rather than need­ing to get “Friendli­ness” the­ory right with­out any ex­per­i­men­ta­tion.

  • There­fore, I be­lieve that the ap­proach SI ad­vo­cates and aims to pre­pare for is far more dan­ger­ous than the stan­dard ap­proach, so if SI’s work on Friendli­ness the­ory af­fects the risk of hu­man ex­tinc­tion one way or the other, it will in­crease the risk of hu­man ex­tinc­tion. For­tu­nately I be­lieve SI’s work is far more likely to have no effect one way or the other.

For a long time I re­frained from en­gag­ing in ob­ject-level de­bates over SI’s work, be­liev­ing that oth­ers are bet­ter qual­ified to do so. But af­ter talk­ing at great length to many of SI’s sup­port­ers and ad­vo­cates and read­ing ev­ery­thing I’ve been pointed to as rele­vant, I still have seen no clear and com­pel­ling re­sponse to any of my three ma­jor ob­jec­tions. As stated above, there are many pos­si­ble re­sponses to my ob­jec­tions, but SI’s cur­rent ar­gu­ments do not seem clear on what re­sponses they wish to take and defend. At this point I am un­likely to form a pos­i­tive view of SI’s work un­til and un­less I do see such re­sponses, and/​or SI changes its po­si­tions.

Is SI the kind of or­ga­ni­za­tion we want to bet on?

This part of the post has some risks. For most of GiveWell’s his­tory, stick­ing to our stan­dard crite­ria—and putting more en­ergy into recom­mended than non-recom­mended or­ga­ni­za­tions—has en­abled us to share our hon­est thoughts about char­i­ties with­out ap­pear­ing to get per­sonal. But when eval­u­at­ing a group such as SI, I can’t avoid plac­ing a heavy weight on (my read on) the gen­eral com­pe­tence, ca­pa­bil­ity and “in­tan­gibles” of the peo­ple and or­ga­ni­za­tion, be­cause SI’s mis­sion is not about re­peat­ing ac­tivi­ties that have worked in the past. Shar­ing my views on these is­sues could strike some as per­sonal or mean-spir­ited and could lead to the mis­im­pres­sion that GiveWell is hos­tile to­ward SI. But it is sim­ply nec­es­sary in or­der to be fully trans­par­ent about why I hold the views that I hold.

For­tu­nately, SI is an ideal or­ga­ni­za­tion for our first dis­cus­sion of this type. I be­lieve the staff and sup­port­ers of SI would over­whelm­ingly rather hear the whole truth about my thoughts—so that they can di­rectly en­gage them and, if war­ranted, make changes—than have me sugar-coat what I think in or­der to spare their feel­ings. Peo­ple who know me and my at­ti­tude to­ward be­ing hon­est vs. spar­ing feel­ings know that this, it­self, is high praise for SI.

One more com­ment be­fore I con­tinue: our policy is that non-pub­lic in­for­ma­tion pro­vided to us by a char­ity will not be pub­lished or dis­cussed with­out that char­ity’s prior con­sent. How­ever, none of the con­tent of this post is based on pri­vate in­for­ma­tion; all of it is based on in­for­ma­tion that SI has made available to the pub­lic.

There are sev­eral rea­sons that I cur­rently have a nega­tive im­pres­sion of SI’s gen­eral com­pe­tence, ca­pa­bil­ity and “in­tan­gibles.” My mind re­mains open and I in­clude speci­fics on how it could be changed.

  • Weak ar­gu­ments. SI has pro­duced enor­mous quan­tities of pub­lic ar­gu­men­ta­tion, and I have ex­am­ined a very large pro­por­tion of this in­for­ma­tion. Yet I have never seen a clear re­sponse to any of the three ba­sic ob­jec­tions I listed in the pre­vi­ous sec­tion. One of SI’s ma­jor goals is to raise aware­ness of AI-re­lated risks; given this, the fact that it has not ad­vanced clear/​con­cise/​com­pel­ling ar­gu­ments speaks, in my view, to its gen­eral com­pe­tence.

  • Lack of im­pres­sive en­dorse­ments. I dis­cussed this is­sue in my 2011 in­ter­view with SI rep­re­sen­ta­tives and I still feel the same way on the mat­ter. I feel that given the enor­mous im­pli­ca­tions of SI’s claims, if it ar­gued them well it ought to be able to get more im­pres­sive en­dorse­ments than it has.

    I have been pointed to Peter Thiel and Ray Kurzweil as ex­am­ples of im­pres­sive SI sup­port­ers, but I have not seen any on-record state­ments from ei­ther of these peo­ple that show agree­ment with SI’s spe­cific views, and in fact (based on watch­ing them speak at Sin­gu­lar­ity Sum­mits) my im­pres­sion is that they dis­agree. Peter Thiel seems to be­lieve that speed­ing the pace of gen­eral in­no­va­tion is a good thing; this would seem to be in ten­sion with SI’s view that AGI will be catas­trophic by de­fault and that no one other than SI is pay­ing suffi­cient at­ten­tion to “Friendli­ness” is­sues. Ray Kurzweil seems to be­lieve that “safety” is a mat­ter of trans­parency, strong in­sti­tu­tions, etc. rather than of “Friendli­ness.” I am per­son­ally in agree­ment with the things I have seen both of them say on these top­ics. I find it pos­si­ble that they sup­port SI be­cause of the Sin­gu­lar­ity Sum­mit or to in­crease gen­eral in­ter­est in am­bi­tious tech­nol­ogy, rather than be­cause they find “Friendli­ness the­ory” to be as im­por­tant as SI does.

    Clear, on-record state­ments from these two sup­port­ers, speci­fi­cally en­dors­ing SI’s ar­gu­ments and the im­por­tance of de­vel­op­ing Friendli­ness the­ory, would shift my views some­what on this point.

  • Re­sis­tance to feed­back loops. I dis­cussed this is­sue in my 2011 in­ter­view with SI rep­re­sen­ta­tives and I still feel the same way on the mat­ter. SI seems to have passed up op­por­tu­ni­ties to test it­self and its own ra­tio­nal­ity by e.g. aiming for ob­jec­tively im­pres­sive ac­com­plish­ments. This is a prob­lem be­cause of (a) its ex­tremely am­bi­tious goals (among other things, it seeks to de­velop ar­tifi­cial in­tel­li­gence and “Friendli­ness the­ory” be­fore any­one else can de­velop ar­tifi­cial in­tel­li­gence); (b) its view of its staff/​sup­port­ers as hav­ing un­usual in­sight into ra­tio­nal­ity, which I dis­cuss in a later bul­let point.

    SI’s list of achieve­ments is not, in my view, up to where it needs to be given (a) and (b). Yet I have seen no dec­la­ra­tion that SI has fallen short to date and ex­pla­na­tion of what will be changed to deal with it. SI’s re­cent re­lease of a strate­gic plan and monthly up­dates are im­prove­ments from a trans­parency per­spec­tive, but they still leave me feel­ing as though there are no clear met­rics or goals by which SI is com­mit­ting to be mea­sured (aside from very ba­sic or­ga­ni­za­tional goals such as “de­sign a new web­site” and very vague goals such as “pub­lish more pa­pers”) and as though SI places a low pri­or­ity on en­gag­ing peo­ple who are crit­i­cal of its views (or at least not yet on board), as op­posed to peo­ple who are nat­u­rally drawn to it.

    I be­lieve that one of the pri­mary ob­sta­cles to be­ing im­pact­ful as a non­profit is the lack of the sort of helpful feed­back loops that lead to suc­cess in other do­mains. I like to see groups that are mak­ing as much effort as they can to cre­ate mean­ingful feed­back loops for them­selves. I per­ceive SI as fal­ling well short on this front. Pur­su­ing more im­pres­sive en­dorse­ments and de­vel­op­ing be­nign but ob­jec­tively rec­og­niz­able in­no­va­tions (par­tic­u­larly com­mer­cially vi­able ones) are two pos­si­ble ways to im­pose more de­mand­ing feed­back loops. (I dis­cussed both of these in my in­ter­view linked above).

  • Ap­par­ent poorly grounded be­lief in SI’s su­pe­rior gen­eral ra­tio­nal­ity. Many of the things that SI and its sup­port­ers and ad­vo­cates say im­ply a be­lief that they have spe­cial in­sights into the na­ture of gen­eral ra­tio­nal­ity, and/​or have su­pe­rior gen­eral ra­tio­nal­ity, rel­a­tive to the rest of the pop­u­la­tion. (Ex­am­ples here, here and here). My un­der­stand­ing is that SI is in the pro­cess of spin­ning off a group ded­i­cated to train­ing peo­ple on how to have higher gen­eral ra­tio­nal­ity.

    Yet I’m not aware of any of what I con­sider com­pel­ling ev­i­dence that SI staff/​sup­port­ers/​ad­vo­cates have any spe­cial in­sight into the na­ture of gen­eral ra­tio­nal­ity or that they have es­pe­cially high gen­eral ra­tio­nal­ity.

    I have been pointed to the Se­quences on this point. The Se­quences (which I have read the vast ma­jor­ity of) do not seem to me to be a demon­stra­tion or ev­i­dence of gen­eral ra­tio­nal­ity. They are about ra­tio­nal­ity; I find them very en­joy­able to read; and there is very lit­tle they say that I dis­agree with (or would have dis­agreed with be­fore I read them). How­ever, they do not seem to demon­strate ra­tio­nal­ity on the part of the writer, any more than a se­ries of en­joy­able, not-ob­vi­ously-in­ac­cu­rate es­says on the qual­ities of a good bas­ket­ball player would demon­strate bas­ket­ball prowess. I some­times get the im­pres­sion that fans of the Se­quences are will­ing to as­cribe su­pe­rior ra­tio­nal­ity to the writer sim­ply be­cause the con­tent seems smart and in­sight­ful to them, with­out mak­ing a crit­i­cal effort to de­ter­mine the ex­tent to which the con­tent is novel, ac­tion­able and im­por­tant.

    I en­dorse Eliezer Yud­kowsky’s state­ment, “Be care­ful … any time you find your­self defin­ing the [ra­tio­nal­ist] as some­one other than the agent who is cur­rently smil­ing from on top of a gi­ant heap of util­ity.” To me, the best ev­i­dence of su­pe­rior gen­eral ra­tio­nal­ity (or of in­sight into it) would be ob­jec­tively im­pres­sive achieve­ments (suc­cess­ful com­mer­cial ven­tures, highly pres­ti­gious awards, clear in­no­va­tions, etc.) and/​or ac­cu­mu­la­tion of wealth and power. As men­tioned above, SI staff/​sup­port­ers/​ad­vo­cates do not seem par­tic­u­larly im­pres­sive on these fronts, at least not as much as I would ex­pect for peo­ple who have the sort of in­sight into ra­tio­nal­ity that makes it sen­si­ble for them to train oth­ers in it. I am open to other ev­i­dence that SI staff/​sup­port­ers/​ad­vo­cates have su­pe­rior gen­eral ra­tio­nal­ity, but I have not seen it.

    Why is it a prob­lem if SI staff/​sup­porter/​ad­vo­cates be­lieve them­selves, with­out good ev­i­dence, to have su­pe­rior gen­eral ra­tio­nal­ity? First off, it strikes me as a be­lief based on wish­ful think­ing rather than ra­tio­nal in­fer­ence. Se­condly, I would ex­pect a se­ries of prob­lems to ac­com­pany over­con­fi­dence in one’s gen­eral ra­tio­nal­ity, and sev­eral of these prob­lems seem to be ac­tu­ally oc­cur­ring in SI’s case:

    • In­suffi­cient self-skep­ti­cism given how strong its claims are and how lit­tle sup­port its claims have won. Rather than en­dors­ing “Others have not ac­cepted our ar­gu­ments, so we will sharpen and/​or re­ex­am­ine our ar­gu­ments,” SI seems of­ten to en­dorse some­thing more like “Others have not ac­cepted their ar­gu­ments be­cause they have in­fe­rior gen­eral ra­tio­nal­ity,” a stance less likely to lead to im­prove­ment on SI’s part.

    • Be­ing too se­lec­tive (in terms of look­ing for peo­ple who share its pre­con­cep­tions) when de­ter­min­ing whom to hire and whose feed­back to take se­ri­ously.

    • Pay­ing in­suffi­cient at­ten­tion to the limi­ta­tions of the con­fi­dence one can have in one’s untested the­o­ries, in line with my Ob­jec­tion 1.

  • Over­all dis­con­nect be­tween SI’s goals and its ac­tivi­ties. SI seeks to build FAI and/​or to de­velop and pro­mote “Friendli­ness the­ory” that can be use­ful to oth­ers in build­ing FAI. Yet it seems that most of its time goes to ac­tivi­ties other than de­vel­op­ing AI or the­ory. Its per-per­son out­put in terms of pub­li­ca­tions seems low. Its core staff seem more fo­cused on Less Wrong posts, “ra­tio­nal­ity train­ing” and other ac­tivi­ties that don’t seem con­nected to the core goals; Eliezer Yud­kowsky, in par­tic­u­lar, ap­pears (from the strate­gic plan) to be fo­cused on writ­ing books for pop­u­lar con­sump­tion. Th­ese ac­tivi­ties seem nei­ther to be ad­vanc­ing the state of FAI-re­lated the­ory nor to be en­gag­ing the sort of peo­ple most likely to be cru­cial for build­ing AGI.

    A pos­si­ble jus­tifi­ca­tion for these ac­tivi­ties is that SI is seek­ing to pro­mote greater gen­eral ra­tio­nal­ity, which over time will lead to more and bet­ter sup­port for its mis­sion. But if this is SI’s core ac­tivity, it be­comes even more im­por­tant to test the hy­poth­e­sis that SI’s views are in fact rooted in su­pe­rior gen­eral ra­tio­nal­ity—and these tests don’t seem to be hap­pen­ing, as dis­cussed above.

  • Theft. I am both­ered by the 2009 theft of $118,803.00 (as against a $541,080.00 bud­get for the year). In an or­ga­ni­za­tion as small as SI, it re­ally seems as though theft that large rel­a­tive to the bud­get shouldn’t oc­cur and that it rep­re­sents a ma­jor failure of hiring and/​or in­ter­nal con­trols.

    In ad­di­tion, I have seen no pub­lic SI-au­tho­rized dis­cus­sion of the mat­ter that I con­sider to be satis­fac­tory in terms of ex­plain­ing what hap­pened and what the cur­rent sta­tus of the case is on an on­go­ing ba­sis. Some de­tails may have to be omit­ted, but a clear SI-au­tho­rized state­ment on this point with as much in­for­ma­tion as can rea­son­ably pro­vided would be helpful.

A cou­ple pos­i­tive ob­ser­va­tions to add con­text here:

  • I see sig­nifi­cant pos­i­tive qual­ities in many of the peo­ple as­so­ci­ated with SI. I es­pe­cially like what I per­ceive as their sincere wish to do what­ever they can to help the world as much as pos­si­ble, and the high value they place on be­ing right as op­posed to be­ing con­ven­tional or po­lite. I have not in­ter­acted with Eliezer Yud­kowsky but I greatly en­joy his writ­ings.

  • I’m aware that SI has rel­a­tively new lead­er­ship that is at­tempt­ing to ad­dress the is­sues be­hind some of my com­plaints. I have a gen­er­ally pos­i­tive im­pres­sion of the new lead­er­ship; I be­lieve the Ex­ec­u­tive Direc­tor and Devel­op­ment Direc­tor, in par­tic­u­lar, to rep­re­sent a step for­ward in terms of be­ing in­ter­ested in trans­parency and in test­ing their own gen­eral ra­tio­nal­ity. So I will not be sur­prised if there is some im­prove­ment in the com­ing years, par­tic­u­larly re­gard­ing the last cou­ple of state­ments listed above. That said, SI is an or­ga­ni­za­tion and it seems rea­son­able to judge it by its or­ga­ni­za­tional track record, es­pe­cially when its new lead­er­ship is so new that I have lit­tle ba­sis on which to judge these staff.


While SI has pro­duced a lot of con­tent that I find in­ter­est­ing and en­joy­able, it has not pro­duced what I con­sider ev­i­dence of su­pe­rior gen­eral ra­tio­nal­ity or of its suit­abil­ity for the tasks it has set for it­self. I see no qual­ifi­ca­tions or achieve­ments that speci­fi­cally seem to in­di­cate that SI staff are well-suited to the challenge of un­der­stand­ing the key AI-re­lated is­sues and/​or co­or­di­nat­ing the con­struc­tion of an FAI. And I see spe­cific rea­sons to be pes­simistic about its suit­abil­ity and gen­eral com­pe­tence.

When es­ti­mat­ing the ex­pected value of an en­deavor, it is nat­u­ral to have an im­plicit “sur­vivor­ship bias”—to use or­ga­ni­za­tions whose ac­com­plish­ments one is fa­mil­iar with (which tend to be rel­a­tively effec­tive or­ga­ni­za­tions) as a refer­ence class. Be­cause of this, I would be ex­tremely wary of in­vest­ing in an or­ga­ni­za­tion with ap­par­ently poor gen­eral com­pe­tence/​suit­abil­ity to its tasks, even if I bought fully into its mis­sion (which I do not) and saw no other groups work­ing on a com­pa­rable mis­sion.

But if there’s even a chance …

A com­mon ar­gu­ment that SI sup­port­ers raise with me is along the lines of, “Even if SI’s ar­gu­ments are weak and its staff isn’t as ca­pa­ble as one would like to see, their goal is so im­por­tant that they would be a good in­vest­ment even at a tiny prob­a­bil­ity of suc­cess.”

I be­lieve this ar­gu­ment to be a form of Pas­cal’s Mug­ging and I have out­lined the rea­sons I be­lieve it to be in­valid in two posts (here and here). There have been some ob­jec­tions to my ar­gu­ments, but I still be­lieve them to be valid. There is a good chance I will re­visit these top­ics in the fu­ture, be­cause I be­lieve these is­sues to be at the core of many of the differ­ences be­tween GiveWell-top-char­i­ties sup­port­ers and SI sup­port­ers.

Re­gard­less of whether one ac­cepts my spe­cific ar­gu­ments, it is worth not­ing that the most promi­nent peo­ple as­so­ci­ated with SI tend to agree with the con­clu­sion that the “But if there’s even a chance …” ar­gu­ment is not valid. (See com­ments on my post from Michael Vas­sar and Eliezer Yud­kowsky as well as Eliezer’s in­ter­view with John Baez.)

Ex­is­ten­tial risk re­duc­tion as a cause

I con­sider the gen­eral cause of “look­ing for ways that philan­thropic dol­lars can re­duce di­rect threats of global catas­trophic risks, par­tic­u­larly those that in­volve some risk of hu­man ex­tinc­tion” to be a rel­a­tively high-po­ten­tial cause. It is on the work­ing agenda for GiveWell Labs and we will be writ­ing more about it.

How­ever, I don’t think that “Cause X is the one I care about and Or­ga­ni­za­tion Y is the only one work­ing on it” to be a good rea­son to sup­port Or­ga­ni­za­tion Y. For donors de­ter­mined to donate within this cause, I en­courage you to con­sider donat­ing to a donor-ad­vised fund while mak­ing it clear that you in­tend to grant out the funds to ex­is­ten­tial-risk-re­duc­tion-re­lated or­ga­ni­za­tions in the fu­ture. (One way to ac­com­plish this would be to cre­ate a fund with “ex­is­ten­tial risk” in the name; this is a fairly easy thing to do and one per­son could do it on be­half of mul­ti­ple donors.)

For one who ac­cepts my ar­gu­ments about SI, I be­lieve with­hold­ing funds in this way is likely to be bet­ter for SI’s mis­sion than donat­ing to SI—through in­cen­tive effects alone (not to men­tion my spe­cific ar­gu­ment that SI’s ap­proach to “Friendli­ness” seems likely to in­crease risks).

How I might change my views

My views are very open to re­vi­sion.

How­ever, I can­not re­al­is­ti­cally com­mit to read and se­ri­ously con­sider all com­ments posted on the mat­ter. The num­ber of peo­ple ca­pa­ble of tak­ing a few min­utes to write a com­ment is suffi­cient to swamp my ca­pac­ity. I do en­courage peo­ple to com­ment and I do in­tend to read at least some com­ments, but if you are look­ing to change my views, you should not con­sider post­ing a com­ment to be the most promis­ing route.

In­stead, what I will com­mit to is read­ing and care­fully con­sid­er­ing up to 50,000 words of con­tent that are (a) speci­fi­cally marked as SI-au­tho­rized re­sponses to the points I have raised; (b) ex­plic­itly cleared for re­lease to the gen­eral pub­lic as SI-au­tho­rized com­mu­ni­ca­tions. In or­der to con­sider a re­sponse “SI-au­tho­rized and cleared for re­lease,” I will ac­cept ex­plicit com­mu­ni­ca­tion from SI’s Ex­ec­u­tive Direc­tor or from a ma­jor­ity of its Board of Direc­tors en­dors­ing the con­tent in ques­tion. After 50,000 words, I may change my views and/​or com­mit to read­ing more con­tent, or (if I de­ter­mine that the con­tent is poor and is not us­ing my time effi­ciently) I may de­cide not to en­gage fur­ther. SI-au­tho­rized con­tent may im­prove or worsen SI’s stand­ing in my es­ti­ma­tion, so un­like with com­ments, there is an in­cen­tive to se­lect con­tent that uses my time effi­ciently. Of course, SI-au­tho­rized con­tent may end up in­clud­ing ex­cerpts from com­ment re­sponses to this post, and/​or already-ex­ist­ing pub­lic con­tent.

I may also change my views for other rea­sons, par­tic­u­larly if SI se­cures more im­pres­sive achieve­ments and/​or en­dorse­ments.

One more note: I be­lieve I have read the vast ma­jor­ity of the Se­quences, in­clud­ing the AI-foom de­bate, and that this con­tent—while in­ter­est­ing and en­joy­able—does not have much rele­vance for the ar­gu­ments I’ve made.

Again: I think that what­ever hap­pens as a re­sult of my post will be pos­i­tive for SI’s mis­sion, whether or not it is pos­i­tive for SI as an or­ga­ni­za­tion. I be­lieve that most of SI’s sup­port­ers and ad­vo­cates care more about the former than about the lat­ter, and that this at­ti­tude is far too rare in the non­profit world.


Thanks to the fol­low­ing peo­ple for re­view­ing a draft of this post and pro­vid­ing thought­ful feed­back (this of course does not mean they agree with the post or are re­spon­si­ble for its con­tent): Dario Amodei, Nick Beck­stead, Elie Hassen­feld, Alexan­der Kruel, Tim Og­den, John Sal­vatier, Jonah Sinick, Cari Tuna, Stephanie Wyk­stra.