Good and bad ways to think about downside risks

This post was writ­ten for Con­ver­gence Anal­y­sis.

Many ac­tions we could take to make the world bet­ter might also have nega­tive effects, or might even be nega­tive over­all. In other words, al­tru­is­tic ac­tions of­ten have down­side risks. Per­haps, for ex­am­ple, that pro­ject you might start, ca­reer path you might pur­sue, or ar­ti­cle you might write could lead to in­for­ma­tion haz­ards, memetic down­side risks, or risks of di­vert­ing re­sources (such as money or at­ten­tion) from more valuable things.[1]

So if I’m con­sid­er­ing do­ing some­thing to make the world bet­ter, but know it might have down­side risks, what should I do? How should I even think about that ques­tion?

  1. Should I just stop wor­ry­ing and go ahead, since I’m ex­cited about this idea and my ini­tial sense is that it’ll prob­a­bly be benefi­cial?

  2. Or should I be es­pe­cially con­cerned about en­sur­ing my ac­tions don’t cause any harm, and thus steer heav­ily against any ac­tion with down­side risks, even if the ex­pected value of that ac­tion still seems good?

  3. Or maybe that’s too strong—per­haps it might be fine for me to go ahead, and I cer­tainly still want to, to have that pos­i­tive im­pact. But maybe I have a duty to at least think care­fully first, and to drop the idea if it is too risky, since these are poli­cies I’d want peo­ple in gen­eral to com­ply with?

  4. Or maybe I shouldn’t see this as just a mat­ter of “com­pli­ance”—as some “duty” sep­a­rate from hav­ing a pos­i­tive im­pact. Maybe I should see avoid­ing caus­ing ac­ci­den­tal harm as just as valuable as “ac­tively mak­ing things bet­ter”—as just an­other core part of my efforts to do good?

We (Con­ver­gence) have ob­served differ­ent peo­ple seem­ing to im­plic­itly use each of these four broad per­spec­tives on down­side risks.[2] We’ll re­fer to these per­spec­tives as, re­spec­tively:

  1. The un­con­cerned perspective

  2. The harm-avoidance per­spec­tive[3]

  3. The com­pli­ance perspective

  4. The pure ex­pected value (or pure EV) perspective

In this post, we’ll un­pack these four per­spec­tives, and we’ll ar­gue in favour of us­ing the pure EV per­spec­tive. Note that this doesn’t re­quire always perform­ing ex­plicit EV calcu­la­tions; of­ten only a quick, qual­i­ta­tive, in­tu­itive as­sess­ment of EV will be war­ranted. Re­lat­edly, this ar­ti­cle is not about meth­ods for es­ti­mat­ing EV (see here for links rele­vant to that topic). In­stead, this ar­ti­cle fo­cuses on ar­gu­ing that, in essence, one should con­sider both po­ten­tial benefits and po­ten­tial harms of an ac­tion (com­pared to the coun­ter­fac­tual), with­out ig­nor­ing down­side risks, over­weight­ing them, or feel­ing that avoid­ing them “limits one’s im­pact”.

We’ll take a typ­i­cal, “ap­prox­i­mately con­se­quen­tial­ist” eth­i­cal frame­work as a start­ing as­sump­tion. We think that the pure EV per­spec­tive is the one which fits most nat­u­rally with such a frame­work. But our ar­gu­ments for it will not be ex­plic­itly fo­cused on philo­soph­i­cal de­bates.[4] In­stead, we’ll use a more nar­ra­tive, con­ver­sa­tional, and emo­tion- or mo­ti­va­tion-fo­cused ap­proach.

This is partly be­cause we ex­pect most read­ers to already ac­cept an “ap­prox­i­mately con­se­quen­tial­ist” eth­i­cal frame­work, and so we don’t ex­pect many read­ers to be de­liber­ately and ex­plic­itly us­ing the un­con­cerned, harm-avoidance, or com­pli­ance per­spec­tives. In cases where read­ers are us­ing those per­spec­tives, we ex­pect this to typ­i­cally be im­plicit or un­in­ten­tional.[5] Thus, we’re aiming less to “change peo­ple’s minds”, and more to ex­plic­itly high­light this menu of op­tions, while mak­ing the pure EV per­spec­tive ap­petis­ing to read­ers’ Sys­tem 1s (rather than just their Sys­tem 2s).

1. The un­con­cerned perspective

Say you’ve got this great op­por­tu­nity or idea for a pro­ject/​ca­reer path/​ar­ti­cle/​some­thing else. To make things con­crete, let’s say a jour­nal­ist has reached out to you re­quest­ing to in­ter­view you about AI safety. This in­ter­view re­ally could make the world bet­ter, by high­light­ing the im­por­tance of the prob­lem, point­ing to some po­ten­tial solu­tions, and hope­fully at­tract­ing more fund­ing, at­ten­tion, and tal­ent to the area. And that’s ex­cit­ing—you could have a real im­pact!

But is there a chance you do­ing this in­ter­view could also have nega­tive im­pacts (com­pared to the coun­ter­fac­tual, which might some­one else be­ing in­ter­viewed or no in­ter­view oc­cur­ring)? For ex­am­ple, per­haps high­light­ing the im­por­tance of AI safety would also in­crease risks by mak­ing cer­tain pow­er­ful ac­tors more in­ter­ested in AI (an “at­ten­tion haz­ard”)? Or per­haps what you say would be spun as, or evolve into, a sim­plis­tic mes­sage that makes dis­cus­sion of AI safety more gen­er­ally look un­founded, over­dra­matic, or overly “sci-fi”? Could the in­ter­view’s im­pact even be nega­tive over­all? Have you re­ally thought about the pos­si­ble un­in­tended con­se­quences?

Maybe you don’t have to. Maybe it’d be re­ally nice not to—you’re ex­cited, and do­ing the in­ter­view seems like a good idea, and one that’s more promis­ing than the other ac­tions you have on the table. So maybe you can just go ahead.

In fact, maybe you should just go ahead—wouldn’t wor­ry­ing about pos­si­ble risks be­fore ev­ery seem­ingly pos­i­tive ac­tion, or avoid­ing any ac­tion that could be harm­ful, just paralyse would-be do-good­ers, and ac­tu­ally leave the world worse off over­all?

We think there’s some val­idity to this per­spec­tive. But it seems to set up a false di­chotomy, as if a per­son’s only op­tions are to:

  1. Go ahead with any ac­tions that seem pos­i­tive at first, and never think at all about down­side risks; or

  2. Worry at length about ev­ery sin­gle ac­tion.

We could add a third op­tion, con­sist­ing of the fol­low­ing heuris­tics (see here for more de­tails):

  • For minor or rou­tine ac­tions (e.g., most con­ver­sa­tions), don’t bother think­ing at all about down­side risks.

  • Typ­i­cally limit think­ing about down­side risks to a quite quick check, and with­out re­ally “wor­ry­ing”.

    • For ex­am­ple, when the jour­nal­ist reaches out to you, you no­tice this isn’t a situ­a­tion you’ve en­coun­tered be­fore, so you spend a minute quickly con­sid­er­ing what harms might come from this ac­tion.

  • Think at more length about down­side risks only in cases where it seems that would be worth­while (e.g., be­fore ma­jor ac­tions, or when the quick check re­vealed there may be im­por­tant down­sides).

    • For ex­am­ple, when your minute of thought re­veals some seem­ingly plau­si­ble down­sides to the in­ter­view, you de­cide to think in more de­tail about the mat­ter, and maybe con­sult with some peo­ple you trust.

  • When sub­stan­tial down­side risks are iden­ti­fied, con­sider ei­ther (a) aban­don­ing the ac­tion or (b) tak­ing a ver­sion of the ac­tion which has lower risks or which al­lows you to mon­i­tor the risks to in­form whether to aban­don the ac­tion later.

That third op­tion, or some­thing like it, doesn’t have to in­volve emo­tional an­guish or anal­y­sis paral­y­sis. Good deeds can still be done, and you can still feel good about do­ing them.

And ask your­self: Why might it feel nice to take the un­con­cerned per­spec­tive? Isn’t it so you’d get to keep feel­ing ex­cite­ment about the pos­i­tive im­pact this ac­tion might have? Well, if you fol­low some­thing like that third op­tion, you can still feel that ex­cite­ment, in all the cases where the ac­tion does have a pos­i­tive im­pact (in ex­pec­ta­tion).

You only lose that ex­cite­ment in the cases where the ac­tion would ac­tu­ally be net nega­tive in ex­pec­ta­tion.[6] But wasn’t your ex­cite­ment all about the pos­i­tive im­pact? Isn’t it there­fore ap­pro­pri­ate for that ex­cite­ment to be lost if the im­pact isn’t pos­i­tive? “That which can be de­stroyed by the truth should be.

2. The harm-avoidance perspective

Maybe you’re at the op­po­site ex­treme—maybe that third op­tion doesn’t feel like enough. Maybe when you see us sug­gest you should some­times not think at all about down­side risks, and some­times go ahead even if there are risks, you won­der: Isn’t that reck­less? How could you bear mak­ing the world worse in some ways, or risk­ing mak­ing it worse over­all—how could you bear the chance of your own ac­tions caus­ing harm? Can the po­ten­tial benefits re­ally jus­tify that?

We again think there’s some val­idity to this per­spec­tive. And we cer­tainly un­der­stand the pull to­wards it. In­tu­itively, it can feel like there’s a strong asym­me­try be­tween caus­ing harm and caus­ing benefits, and a similar asym­me­try be­tween do­ing harm and “merely al­low­ing” harm. It can feel like we have a strong duty to avoid ac­tions that harm things, or that risk do­ing so, even when those ac­tions are nec­es­sary “for the greater good”, or to pre­vent other harms.

From here, we could move into a de­bate over con­se­quen­tial­ism vs non-con­se­quen­tial­ism. But as we noted ear­lier, we’re tak­ing an ap­prox­i­mately con­se­quen­tial­ist frame­work as a start­ing as­sump­tion, and pri­mar­ily ad­dress­ing peo­ple who we think share that frame­work, but who still feel a pull to­wards this harm-avoidance per­spec­tive. So we’ll in­stead try to ham­mer home just how paralysing and im­prac­ti­cal it’d be to fully al­ign your be­havi­ours with this harm-avoidance per­spec­tive.

We live in a com­pli­cated world, pop­u­lated with at least billions of be­ings we care about, each with var­i­ous, of­ten volatile, of­ten con­flict­ing prefer­ences, each net­worked and in­ter­act­ing in myr­iad, of­ten ob­scured, of­ten im­plicit ways. And if you want to make that world bet­ter, that means you want to change it. If you poke, tweak, add, or re­move any gear in that mas­sive ma­chine, the im­pacts won’t just be lo­cal and sim­ple—there’ll be re­ver­ber­a­tions in dis­tant cor­ners you hadn’t ever thought about.

Those re­ver­ber­a­tions are cer­tainly worth tak­ing se­ri­ously. This is why it’s worth think­ing about and try­ing to miti­gate down­side risks—in­clud­ing those far re­moved from the time, place, or in­ten­tion of your ac­tion. This is also why we wrote a se­ries of posts on down­side risks.

But re­ver­ber­a­tions are always hap­pen­ing, with or with­out you. And any­thing you do will cause them. And so would what­ever you in­ter­pret “not do­ing any­thing” to mean.

So are you re­ally go­ing to always stand by—let the ma­chine keep grind­ing down who­ever or what­ever it may be grind­ing down; let it chug along to­wards what­ever cliffs it may be chug­ging along to­wards—just to avoid do­ing any harm? Just to avoid any risk of it be­ing you who “causes” harm—even if avoid­ing that risk means more harm hap­pens?

Are you go­ing to stand by even when your best guess is that an ac­tion re­ally is pos­i­tive in ex­pec­ta­tion?

For ex­am­ple, if you can see any down­side risks from do­ing the in­ter­view on AI safety with that jour­nal­ist, the harm-avoidance per­spec­tive would sug­gest definitely turn­ing the in­ter­view down. This is even if, af­ter seek­ing out the views of peo­ple you trust and putting in a lot of care­ful thought, it re­ally does seem the down­side risks are out­weighed by the po­ten­tial benefits. And this is even though, if the EV is in­deed pos­i­tive, it’s pre­dictable that var­i­ous other harms will oc­cur if you turn the in­ter­view down, such as harms from some­one less qual­ified be­ing in­ter­viewed or from AI safety con­tin­u­ing to be ne­glected.

As you can prob­a­bly tell, we ad­vo­cate against this harm-avoidance per­spec­tive. We ad­vo­cate for tak­ing down­side risks se­ri­ously—per­haps more se­ri­ously than most peo­ple cur­rently do—but also for be­ing will­ing to take ac­tion when the EV re­ally is pos­i­tive (as best you can tell).

3. The com­pli­ance perspective

Let’s say you’re con­vinced by the ar­gu­ments above, so you’re ditch­ing the harm-avoidance per­spec­tive, and you can go ahead with an ac­tion—that AI safety in­ter­view, for ex­am­ple—as long as its EV is pos­i­tive. Ex­cel­lent! You’re still pretty ex­cited by the im­pact this in­ter­view could have.

But down­side risks are a real con­cern, so you can’t take the un­con­cerned per­spec­tive ei­ther. You do have to do your due dili­gence, right? You’d ad­vise oth­ers to think about the risks of their ac­tions—you think that’s a good norm in gen­eral—so you guess you have a re­spon­si­bil­ity to com­ply with it too? And you guess that if the in­ter­view does turn out to seem too risky, you’d have to turn it down—how­ever an­noy­ing it’d be to thereby give up this chance to pos­si­bly cause some pos­i­tive im­pacts too.

We call this the “com­pli­ance” per­spec­tive. In some ways, this per­spec­tive ac­tu­ally seems pretty ok to us; de­pend­ing on the de­tails, it might not be “in­valid” in any way, might not ac­tu­ally clash with a mostly con­se­quen­tial­ist frame­work, and might not cause is­sues. But we think there are many peo­ple for whom this frame­work prob­a­bly isn’t ideal, in terms of mo­ti­va­tion.

That’s be­cause the per­spec­tive frames cau­tion about down­side risks as some­thing like a spe­cific, ex­ter­nal obli­ga­tion, sec­ondary to one’s real, main goal of hav­ing a pos­i­tive im­pact. It frames cau­tion as the sort of thing “a good per­son should do”, but not as it­self good and im­pact­ful in the way that “do­ing a pos­i­tive ac­tion” would be. It could make cau­tion seem, on some emo­tional level, like a bur­den­some duty, get­ting in the way of the ac­tu­ally im­pact­ful things you re­ally want to do.

And if cau­tion does feel like just com­pli­ance, it might also feel frus­trat­ing and de­mo­ti­vat­ing. So you might ap­ply cau­tion a lit­tle too rarely, or a lit­tle too lightly. You might con­vince your­self the risks are worth­while a lit­tle too of­ten. And bit by bit, we might see an ac­cu­mu­la­tion of harms we could’ve pre­vented by tak­ing a less frus­trat­ing, more in­trin­si­cally mo­ti­vat­ing per­spec­tive on down­side risks.

How can we avoid those is­sues? By tak­ing the pure ex­pected value per­spec­tive. The next sec­tion will de­scribe what it looks like to take that per­spec­tive.

4. The pure ex­pected value perspective

Let’s start again from the top. This jour­nal­ist has reached out to you re­quest­ing an in­ter­view on AI safety, and you re­al­ise this might be a way to make the world bet­ter. Fan­tas­tic! But you haven’t re­ally thought about the down­sides yet: it’s pos­si­ble that, in ex­pec­ta­tion, the in­ter­view would be net nega­tive. It’s also pos­si­ble that it would be a bit nega­tive, and that you can make some changes to miti­gate those down­sides.

No prob­lem! Just as­sess the EV of you do­ing the in­ter­view (com­pared to the coun­ter­fac­tual), tak­ing ac­count of both its risks and its po­ten­tial benefits, and ad­just­ing for the unilat­er­al­ist’s curse where nec­es­sary. This could in­volve any­thing from a quick check to a de­tailed, lengthy as­sess­ment in­volv­ing in­put from oth­ers; con­sider how high the value of in­for­ma­tion would be.[7] And this may or may not in­volve ex­plicit, quan­ti­ta­tive es­ti­mates. For ex­am­ple, you might sim­ply spend 30 min­utes con­sid­er­ing the var­i­ous effects the in­ter­view might have, qual­i­ta­tively weigh­ing up how prob­a­ble and how good or bad they are, and ar­riv­ing at an over­all sense of whether the benefits out­weigh the risks. [8]

If, af­ter that, you’re fairly con­fi­dent the in­ter­view’s im­pacts re­ally are net pos­i­tive in ex­pec­ta­tion—great, go ahead and do it!

If it seems the ex­pected im­pacts could be made net pos­i­tive (or more pos­i­tive) if you mod­ify the ac­tion to re­duce its risks or al­low you to mon­i­tor them—great, go ahead and do that! As noted above, in the in­ter­view ex­am­ple, this could in­clude things like ask­ing to provide writ­ten rather than ver­bal an­swers, and run­ning those an­swers by peo­ple you trust.

If it seems the in­ter­view’s im­pacts are net-nega­tive in ex­pec­ta­tion, and that that can’t be fixed by just mod­ify­ing the ac­tion to miti­gate or mon­i­tor those risks—well, maybe that’s not great, but it’s definitely great you found out! Think about all the harm you pre­vented by as­sess­ing the down­side risks so you can now avoid go­ing through with this ac­tion! Think about how much bet­ter your de­ci­sion to be cau­tious has made the world! And re­mem­ber that there’s still a vast ar­ray of other po­ten­tial ac­tions you could take—that wasn’t your only shot to make a differ­ence. (Plus, if the ex­pected effects of the ac­tion you were con­sid­er­ing look net nega­tive on close in­spec­tion, yet this wasn’t ob­vi­ous from the start, you might be able to do fur­ther good by writ­ing up and shar­ing these in­sights with other peo­ple.)

The “pure EV” per­spec­tive re­jects the un­con­cerned per­spec­tive’s strange in­cli­na­tion to avoid look­ing too closely at the po­ten­tial risks of a planned ac­tion in case that’d burst your bub­ble. It also re­jects the harm-avoidance per­spec­tive’s em­pha­sis on steer­ing clear of any ac­tion with any down­side risk, based on per­ceived asym­me­tries be­tween caus­ing harm and caus­ing good, or be­tween do­ing and al­low­ing harm. Fur­ther, it re­jects the com­pli­ance per­spec­tive’s sense that pre­vent­ing down­side risks from your ac­tions is some ad­di­tional, sec­ondary prin­ci­ple you have a duty to com­ply with, rather than a core part of how you can pos­i­tively im­pact the world.

In place of those things, this per­spec­tive sim­ply says to:

  1. In­vest an ap­pro­pri­ate level of effort into work­ing out the EV of an ac­tion and into think­ing of ways to im­prove that EV (such as through miti­gat­ing and/​or mon­i­tor­ing any down­side risks).

  2. Take the ac­tion (or the best ver­sion of it) if that EV is pos­i­tive (af­ter ad­just­ing for the unilat­er­al­ist’s curse, where nec­es­sary).

  3. Feel good about both steps of that pro­cess.

Thus, this is the per­spec­tive we use our­selves, and the one we recom­mend. We hope that it will help us and oth­ers strike the best bal­ance be­tween tak­ing ac­tions that are worth tak­ing and avoid­ing ac­tions that truly are too risky.

Clos­ing remarks

We hope this post will be helpful in your efforts to im­prove the world. Ad­di­tion­ally, if you do sub­scribe to a mostly con­se­quen­tial­ist frame­work, and yet feel a pull you’d rather not feel to­wards the un­con­cerned, harm-avoidance, or com­pli­ance per­spec­tives, we hope this post will help you bet­ter al­ign your pat­terns of thought and feel­ing with those which you aim to cul­ti­vate.

For dis­cus­sion of types of down­sides risks, situ­a­tions in which they’re most likely to oc­cur, how and when to as­sess them, or how to pre­vent or miti­gate them, see other posts in this se­quence.

My thanks to Justin Shov­e­lain for helping de­velop the ideas in this post, and to Justin, David Kristoffers­son, Olga Babeeva, and Max Dal­ton for helpful feed­back. This does not im­ply their en­dorse­ment of all points made.

  1. See here for ad­di­tional sources on down­side risks and ac­ci­den­tal harm. ↩︎

  2. Of course, it’s also pos­si­ble to use differ­ent ver­sions of these per­spec­tives, to use com­bi­na­tions of mul­ti­ple per­spec­tives at the same time, or to switch be­tween differ­ent per­spec­tives in differ­ent situ­a­tions. ↩︎

  3. There is also a per­son­al­ity trait called “harm avoidance”, which is not what we’re refer­ring to in this post. ↩︎

  4. Some of the de­bates that are rele­vant here, but which we won’t ex­plic­itly ad­dress, are those re­gard­ing con­se­quen­tial­ism vs non-con­se­quen­tial­ism, do­ing vs al­low­ing harm, the acts/​omis­sions doc­trine, eth­i­cal offset­ting, “ex­cited” vs “obli­ga­tory” al­tru­ism, and risk-neu­tral vs risk-averse vs risk-seek­ing prefer­ences. ↩︎

  5. A per­son’s use of one of those three per­spec­tives could per­haps re­sult from habits and in­tu­itions shaped by “com­mon sense”, the be­havi­ours and at­ti­tudes of peo­ple around the per­son, or ex­pe­rience in fields that lack a long-term, al­tru­is­tic ethic. ↩︎

  6. You may also lose some of the ex­cite­ment in cases where the ac­tion still seems net-pos­i­tive in ex­pec­ta­tion, but less so, due to risks that are no­table but not over­whelming. But that par­tial loss of ex­cite­ment would seem to us like an in­stance of emo­tions track­ing re­al­ity ap­pro­pri­ately. ↩︎

  7. For ex­am­ple, it’s worth putting more effort into as­sess­ing the EV of an ac­tion the more un­cer­tain you are about the prob­a­bil­ity and value/​dis­value of effects the ac­tion might have, the big­ger those prob­a­bil­ities and val­ues/​dis­val­ues might be, and the like­lier that ex­tra effort is to re­solve those un­cer­tain­ties. ↩︎

  8. As noted ear­lier, this post is not fo­cused on meth­ods for es­ti­mat­ing EV, and more in­for­ma­tion on that can be found in the sources linked to here. ↩︎