Forecasting Newsletter: July 2020.


  • So­cial Science Pre­dic­tion Plat­form launches.

  • Ioan­ni­dis and Taleb dis­cuss op­ti­mal re­sponse to COVID-19.

  • Re­port tries to fore­see the (po­ten­tially quite high) div­i­dends of con­flict pre­ven­tion from 2020 to 2030.


  • High­lights.

  • Pre­dic­tion Mar­kets & Fore­cast­ing Plat­forms.

  • New un­der­tak­ings.

  • Nega­tive Ex­am­ples.

  • News & Hard to Cat­e­go­rize Con­tent.

  • Long Con­tent.

Sign up here, browse past newslet­ters here, or view it on the EA fo­rum here.

Pre­dic­tion Mar­kets & Fore­cast­ing Plat­forms.

Ordered in sub­jec­tive or­der of im­por­tance:

  • Me­tac­u­lus con­tinues host­ing great dis­cus­sion.

  • In par­tic­u­lar, it has re­cently hosted some high-qual­ity AI ques­tions.

  • User @alexrjl, a mod­er­a­tor on the plat­form, offers on the EA fo­rum to op­er­a­tional­ize ques­tions and post them on Me­tac­u­lus, for free. This hasn’t been picked up by the EA Fo­rum al­gorithms, but the offer seems to me to be quite valuable. Some ex­am­ples of things you might want to see op­er­a­tional­ized and fore­casted: the fund­ing your or­ga­ni­za­tion will re­ceive in 2020, whether any par­tic­u­larly key bills will be­come law, whether GiveWell will change their top char­i­ties, etc.

  • Foretell is a pre­dic­tion mar­ket by the Univer­sity of Ge­orge­town’s Cen­ter for Se­cu­rity and Emerg­ing Tech­nol­ogy, fo­cused on ques­tions rele­vant to tech­nol­ogy-se­cu­rity policy, and on bring­ing those fore­casts to policy-mak­ers.

  • Some EAs, such as my­self or a mys­te­ri­ous user named fore­told, fea­ture on the top spots of their (ad­mit­tedly quite young) leader­board.

  • I also have the op­por­tu­nity to cre­ate a team on the site: if you have a proven track record and would be in­ter­ested in join­ing such a team, get in touch be­fore the 10th of Au­gust.

  • Repli­ca­tion Markets

  • pub­lished their first paper

  • had some difficul­ties with cheaters:

    “The Team at Repli­ca­tion Mar­kets is de­lay­ing an­nounc­ing the Round 8 Sur­vey win­ners be­cause of an in­ves­ti­ga­tion into co­or­di­nated fore­cast­ing among a group of par­ti­ci­pants. As a re­sult, eleven ac­counts have been sus­pended and their data has been ex­cluded from the study. Scores are be­ing re­calcu­lated and prize an­nounce­ments will go out soon.”

    • Be­cause of how Repli­ca­tion Mar­kets are struc­tured, I’m bet­ting the cheat­ing was by ma­nipu­lat­ing the Key­ne­sian beauty con­test in a Pre­dict-O-Matic fash­ion. That is, cheaters could have co­or­di­nated to out­put some­thing sur­pris­ing dur­ing the Key­ne­sian Beauty Con­test round, and then make that sur­pris­ing thing come to hap­pen dur­ing the mar­ket trad­ing round. Charles Twardy, prin­ci­pal in­ves­ti­ga­tor at Repli­ca­tion Mar­kets, gives a more pos­i­tive take on the Key­ne­sian beauty con­test as­pects of Repli­ca­tion Mar­kets here.

  • still have Round 10 open un­til the 3rd of Au­gust.

  • At the Good Judge­ment fam­ily, Good Judge­ment An­a­lyt­ics con­tinues to provide its COVID-19 dash­board.

Model­ing is a very good way to ex­plain how a virus will move through an un­con­strained herd. But when you be­gin to put in con­straints” — mask man­dates, stay-at-home or­ders, so­cial dis­tanc­ing — “and then the herd has agency whether they’re go­ing to com­ply, at that point, hu­man fore­cast­ers who are very smart and have read through the mod­els, that’s where they re­ally be­gin to add value. – Marc Koehler, Vice Pres­i­dent of Good Judge­ment, Inc., in a re­cent interview

  • Highly Spec­u­la­tive Es­ti­mates, an in­ter­face, library and syn­tax to pro­duce dis­tri­bu­tional prob­a­bil­is­tic es­ti­mates led by Ozzie Gooen, now ac­cepts func­tions as part of its in­put, such that more com­pli­cated in­puts like the fol­low­ing are now pos­si­ble:

# Vari­able: Num­ber of ice creams an un­su­per­vised child has con­sumed,

# when left alone in an ice cream shop.  

# Cur­rent time (hours passed)  


# Sce­nario with lots of un­cer­tainty

w_1 = 0.75 ## Weight for this sce­nario.

min_un­cer­tain(t) = t*2

max_un­cer­tain(t) = t*20

# Op­ti­mistic sce­nario

w_2 = 0.25 ## Weight for the op­ti­mistic sce­nario

min_op­ti­mistic(t) = 1*t

max_op­ti­mistic(t) = 3*t

mean(t) = (min_op­ti­mistic(t) + max_op­ti­mistic(t)/​2)

stdev(t) = t*(2)^(1/​2)

# Over­all guess

## A long-tailed log­nor­mal for the un­cer­tain sce­nario

## and a tight nor­mal for the op­ti­mistic sce­nario

mm(min_un­cer­tain(t) to max_un­cer­tain(t), nor­mal(mean(t), stdev(t)), [w_1, w_2])

## Com­pare with: mm(2 to 20, nor­mal(2, 1.4142), [0.75, 0.25])

“Cur­rent pre­dic­tion mar­kets are so bad in so many differ­ent ways that it sim­ply is not sur­pris­ing for peo­ple to know bet­ter than them, and it of­ten is not pos­si­ble for peo­ple to make money from know­ing bet­ter.”

  • Augur, a bet­ting plat­form built on top of Ethereum, launches v2. Here are two overviews of the plat­form and of v2 modifications

New undertakings

A new re­sult builds on the con­sen­sus, or lack thereof, in an area and is of­ten eval­u­ated for how sur­pris­ing, or not, it is. In turn, the novel re­sult will lead to an up­dat­ing of views. Yet we do not have a sys­tem­atic pro­ce­dure to cap­ture the sci­en­tific views prior to a study, nor the up­dat­ing that takes place af­ter­ward. What did peo­ple pre­dict the study would find? How would know­ing this re­sult af­fect the pre­dic­tion of find­ings of fu­ture, re­lated stud­ies?

A sec­ond benefit of col­lect­ing pre­dic­tions is that they [...] can also po­ten­tially help to miti­gate pub­li­ca­tion bias. How­ever, if pri­ors are col­lected be­fore car­ry­ing out a study, the re­sults can be com­pared to the av­er­age ex­pert pre­dic­tion, rather than to the null hy­poth­e­sis of no effect. This would al­low re­searchers to con­firm that some re­sults were un­ex­pected, po­ten­tially mak­ing them more in­ter­est­ing and in­for­ma­tive, be­cause they in­di­cate re­jec­tion of a prior held by the re­search com­mu­nity; this could con­tribute to alle­vi­at­ing pub­li­ca­tion bias against null re­sults.

A third benefit of col­lect­ing pre­dic­tions sys­tem­at­i­cally is that it makes it pos­si­ble to im­prove the ac­cu­racy of pre­dic­tions. In turn, this may help with ex­per­i­men­tal de­sign.

  • On the one hand, I could imag­ine this hav­ing an im­pact, and the en­thu­si­asm of the founders is con­ta­gious. On the other hand, as a fore­caster I don’t feel en­ticed by the plat­form: they offer a $25 re­ward to grad stu­dents (which I am not), and don’t spell it out for me why I would want to fore­cast on their plat­form as op­posed to on all the other al­ter­na­tives available to me, even ac­count­ing for al­tru­is­tic im­pact.

  • Ought is a re­search lab build­ing tools to del­e­gate open-ended rea­son­ing to AI & ML sys­tems.

  • Since con­clud­ing their ini­tial fac­tored cog­ni­tion ex­per­i­ments in 2019, they’ve been build­ing tools to cap­ture and au­to­mate the rea­son­ing pro­cess in fore­cast­ing: Ergo, a library for in­te­grat­ing model-based and judg­men­tal fore­cast­ing, and Elicit, a tool built on top of Ergo to help fore­cast­ers ex­press and share dis­tri­bu­tions.

  • They’ve re­cently run small-scale tests ex­plor­ing am­plifi­ca­tion and del­e­ga­tion of fore­cast­ing, such as: Am­plify Ro­hin’s Pre­dic­tion on AGI re­searchers & Safety Con­cerns, Am­plified fore­cast­ing: What will Buck’s in­formed pre­dic­tion of com­pute used in the largest ML train­ing run be­fore 2030 be?, and Del­e­gate a Fore­cast.

  • In ad­di­tion to study­ing fac­tored cog­ni­tion in the fore­cast­ing con­text, they are broadly in­ter­ested in whether the EA com­mu­nity could benefit from bet­ter fore­cast­ing tools: they can be reached out to if you want to give them feed­back or dis­cuss their work.

  • The Pipeline Pro­ject is a pro­ject similar to Repli­ca­tion Mar­kets, by some of the same au­thors, to find out whether peo­ple can pre­dict whether a given study will repli­cate. They offer au­thor­ship in an ap­pendix, as well as a chance to get a to­ken mon­e­tary com­pen­sa­tion.

  • USAID’s In­tel­li­gent Fore­cast­ing: A Com­pe­ti­tion to Model Fu­ture Con­tra­cep­tive Use. “First, we will award up to 25,000 USD in prizes to in­no­va­tors who de­velop an in­tel­li­gent fore­cast­ing model—us­ing the data we provide and meth­ods such as ar­tifi­cial in­tel­li­gence (AI)—to pre­dict the con­sump­tion of con­tra­cep­tives over three months. If im­ple­mented, the model should im­prove the availa­bil­ity of con­tra­cep­tives and fam­ily plan­ning sup­plies at health ser­vice de­liv­ery sites through­out a na­tion­wide health­care sys­tem. Se­cond, we will award a Field Im­ple­men­ta­tion Grant of ap­prox­i­mately 100,000 to 200,000 USD to cus­tomize and test a high-perform­ing in­tel­li­gent fore­cast­ing model in Côte d’Ivoire.”

  • Omen is an­other cryp­tocur­rency-based pre­dic­tion mar­ket, which seems to use the same front-end (and prob­a­bly back-end) as Corona In­for­ma­tion Mar­kets. It’s un­clear what their ad­van­tages with re­spect to Augur are.

  • Yngve Høiseth re­leases a pre­dic­tion scorer, based on his pre­vi­ous work on Em­piri­c­ast. In Python, but also available as a REST API

Nega­tive Ex­am­ples.

  • The In­ter­na­tional En­ergy Agency had ter­rible fore­casts on so­lar photo-voltaic en­ergy pro­duc­tion, un­til re­cently:

...It’s a sce­nario as­sum­ing cur­rent poli­cies are kept and no new poli­cies are added.

...the dis­crep­ancy ba­si­cally im­plies that ev­ery year loads of un­planned sub­sidies are added… So it boils down to: it’s not a fore­cast and any er­ror you find must be at­tributed to that. And no you can­not see how the model works.

The IEA web­site ex­plains the WEO pro­cess: “The de­tailed pro­jec­tions are gen­er­ated by the World En­ergy Model, a large-scale simu­la­tion tool, de­vel­oped at the IEA over a pe­riod of more than 20 years that is de­signed to repli­cate how en­ergy mar­kets func­tion.”

News & Hard to Cat­e­go­rize Con­tent.

Bud­get cred­i­bil­ity, or the abil­ity of gov­ern­ments to ac­cu­rately fore­cast macro-fis­cal vari­ables, is cru­cial for effec­tive pub­lic fi­nance man­age­ment. Fis­cal marks­man­ship anal­y­sis cap­tures the ex­tent of er­rors in the bud­getary fore­cast­ing… Par­ti­tion­ing the sources of er­rors, we iden­ti­fied that the er­rors were more broadly ran­dom than due to sys­tem­atic bias, ex­cept for a few cru­cial macro-fis­cal vari­ables where im­prov­ing the fore­cast­ing tech­niques can provide bet­ter es­ti­mates.

A Bloomberg anal­y­sis of more than 3,200 same-year coun­try fore­casts pub­lished each spring since 1999 found a wide vari­a­tion in the di­rec­tion and mag­ni­tude of er­rors. In 6.1 per­cent of cases, the IMF was within a 0.1 per­centage-point mar­gin of er­ror. The rest of the time, its fore­casts un­der­es­ti­mated GDP growth in 56 per­cent of cases and over­es­ti­mated it in 44 per­cent. The av­er­age fore­cast miss, re­gard­less of di­rec­tion, was 2.0 per­centage points, but ob­scures a no­table differ­ence be­tween the av­er­age 1.3 per­centage-point er­ror for ad­vanced economies com­pared with 2.1 per­centage points for more volatile and harder-to-model de­vel­op­ing economies. Since the fi­nan­cial crisis, how­ever, the IMF’s fore­cast ac­cu­racy seems to have im­proved, as growth num­bers have gen­er­ally fallen.

Bank­ing and sovereign debt pan­ics hit Greece, Ire­land, Por­tu­gal and Cyprus to vary­ing de­grees, threat­en­ing the in­tegrity of the euro area and re­quiring emer­gency in­ter­ven­tion from multi­na­tional au­thor­i­ties. Dur­ing this pe­riod, the IMF wasn’t merely fore­cast­ing what would hap­pen to these coun­tries but also set­ting the terms. It pro­vided billions in bailout loans in ex­change for im­ple­men­ta­tion of strict aus­ter­ity mea­sures and other poli­cies, of­ten bit­terly op­posed by the coun­tries’ cit­i­zens and poli­ti­ci­ans.

  • I keep see­ing ev­i­dence that Trump will lose re­elec­tion, but I don’t know how se­ri­ously to take it, be­cause I don’t know how filtered it is.

  • For ex­am­ple, the The Economist’s model fore­casts 91% that Bi­den will win the up­com­ing USA elec­tions. Should I up­date some­what to­wards Bi­den win­ning af­ter see­ing it? What if I sus­pect that it’s the most ex­treme model, and that it has come to my at­ten­tion be­cause of that fact? What if I sus­pect that it’s the most ex­treme model which will pre­dict a demo­cratic win? What if there was an­other equally rep­utable model which pre­dicts 91% for Trump, but which I never got to see be­cause of in­for­ma­tion filter dy­nam­ics?

  • The the Pri­mary Model con­firmed my sus­pi­cions of filter dy­nam­ics. It “does not use pres­i­den­tial ap­proval or the state of the econ­omy as pre­dic­tors. In­stead it re­lies on the perfor­mance of the pres­i­den­tial nom­i­nees in pri­maries”, and on how many terms the party has con­trol­led the White House. The model has been de­vel­oped by an oth­er­wise un­re­mark­able pro­fes­sor of poli­ti­cal sci­ence at New York’s Stony Brook Univer­sity, and has done well in pre­vi­ous elec­tion cy­cles. It as­signs 91% to Trump win­ning re­elec­tion.

  • Fore­cast­ing at Uber: An In­tro­duc­tion. Uber fore­casts de­mand so that they know amongst other things, when and where to di­rect their ve­hi­cles. Be­cause of the challenges to test­ing and com­par­ing fore­cast­ing frame­works at scale, they de­vel­oped their own soft­ware for this.

  • Fore­cast­ing Sales In Th­ese Uncer­tain Times.

[...] a com­pany sel­l­ing to lower-in­come con­sumers might use the monthly em­ploy­ment re­port for the U.S. to see how peo­ple with just a high school ed­u­ca­tion are do­ing find­ing jobs. A busi­ness sel­l­ing lux­ury goods might mon­i­tor the stock mar­ket.

  • Unilever Chief Sup­ply Officer on fore­cast­ing: “Agility does trump fore­cast­ing. At the end of the day, ev­ery dol­lar we spent on ag­ility has prob­a­bly got a 10x re­turn on ev­ery dol­lar spent on fore­cast­ing or sce­nario plan­ning.”

An em­pha­sis on ag­ility over fore­cast­ing meant short­en­ing plan­ning cy­cles — the com­pany re­duced its plan­ning hori­zon from 13 weeks to four. The weekly plan­ning meet­ing be­came a daily meet­ing. Ex­ist­ing de­mand baselines and even ar­tifi­cial in­tel­li­gence pro­grams no longer ap­plied as con­sumer spend­ing and pro­duc­tion ca­pac­ity strayed farther from his­tor­i­cal trends.

This bias to­ward fa­vor­able out­comes… ap­pears for a wide va­ri­ety of nega­tive events, in­clud­ing dis­eases such as can­cer, nat­u­ral dis­asters such as earth­quakes and a host of other events rang­ing from un­wanted preg­nan­cies and radon con­tam­i­na­tion to the end of a ro­man­tic re­la­tion­ship. It also emerges, albeit less strongly, for pos­i­tive events, such as grad­u­at­ing from col­lege, get­ting mar­ried and hav­ing fa­vor­able med­i­cal out­comes.

Nancy Rea­gan hired an as­trologer, Joan Quigley, to screen Ron­ald Rea­gan’s sched­ule of pub­lic ap­pear­ances ac­cord­ing to his horo­scope, allegedly in an effort to avoid as­sas­si­na­tion at­tempts.

Google, Ya­hoo!, Hewlett-Packard, Eli Lilly, In­tel, Microsoft, and France Tele­com have all used in­ter­nal pre­dic­tion mar­kets to ask their em­ploy­ees about the likely suc­cess of new drugs, new prod­ucts, fu­ture sales.

Although pre­dic­tion mar­kets can work well, they don’t always. IEM, Pre­dic­tIt, and the other on­line mar­kets were wrong about Brexit, and they were wrong about Trump’s win in 2016. As the Har­vard Law Re­view points out, they were also wrong about find­ing weapons of mass de­struc­tion in Iraq in 2003, and the nom­i­na­tion of John Roberts to the U.S. Supreme Court in 2005. There are also plenty of ex­am­ples of small groups re­in­forc­ing each other’s mod­er­ate views to reach an ex­treme po­si­tion, oth­er­wise known as group­think, a the­ory de­vised by Yale psy­chol­o­gist Irv­ing Ja­nis and used to ex­plain the Bay of Pigs in­va­sion.

al­though thought­ful traders should ul­ti­mately drive the price, that doesn’t always hap­pen. The [pre­dic­tion] mar­kets are also no less prone to be­ing caught in an in­for­ma­tion bub­ble than Bri­tish in­vestors in the South Sea Com­pany in 1720 or spec­u­la­tors dur­ing the tulip ma­nia of the Dutch Repub­lic in 1637.

Long Con­tent.

  • Michael Story, “Jot­ting down things I learned from be­ing a su­perfore­caster.”

Small teams of smart, fo­cused and ra­tio­nal gen­er­al­ists can ab­solutely smash big well-re­sourced in­sti­tu­tions at knowl­edge pro­duc­tion, for the same rea­sons star­tups can beat big rich in­cum­bent businesses

There’s a lot more to mak­ing pre­dic­tive ac­cu­racy work in prac­tice than win­ning a fore­cast­ing tour­na­ment. Com­pe­ti­tions are about daily frac­tional up­dat­ing, long lead times and ex­haus­tive pre-fore­cast re­search on ques­tions es­pe­cially cho­sen for com­pet­i­tive suitability

Real life fore­cast­ing of­ten re­quires fast turnaround times, fuzzy ques­tions, and difficult-to-define an­swers with un­clear re­s­olu­tion crite­ria. In a com­pe­ti­tion, a ques­tion with am­bigu­ous re­s­olu­tion is thrown out, but in a crisis it might be the most im­por­tant work you do

An am­bi­guity-averse in­di­vi­d­ual would rather choose an al­ter­na­tive where the prob­a­bil­ity dis­tri­bu­tion of the out­comes is known over one where the prob­a­bil­ities are un­known. This be­hav­ior was first in­tro­duced through the Ells­berg para­dox (peo­ple pre­fer to bet on the out­come of an urn with 50 red and 50 blue balls rather than to bet on one with 100 to­tal balls but for which the num­ber of blue or red balls is un­known).

If your best guess for X is 0.37, but you’re very un­cer­tain, you still shouldn’t re­place it with an im­pre­cise ap­prox­i­ma­tion (e.g. “roughly 0.4”, “fairly un­likely”), as this re­moves in­for­ma­tion. It is bet­ter to offer your pre­cise es­ti­mate, alongside some es­ti­mate of its re­silience, ei­ther sub­jec­tively (“0.37, but if I thought about it for an hour I’d ex­pect to go up or down by a fac­tor of 2”), or ob­jec­tively (“0.37, but I think the stan­dard er­ror for my guess to be ~0.1″).

  • Ex­pert Fore­cast­ing with and with­out Uncer­tainty Quan­tifi­ca­tion and Weight­ing: What Do the Data Say?: “it’s bet­ter to com­bine ex­pert un­cer­tain­ties (e.g. 90% con­fi­dence in­ter­vals) than to com­bine their point fore­casts, and it’s bet­ter still to com­bine ex­pert un­cer­tain­ties based on their past perfor­mance.”

    • See also a 1969 pa­per by fu­ture No­bel Prize win­ner Clive Granger: “Two sep­a­rate sets of fore­casts of air­line pas­sen­ger data have been com­bined to form a com­pos­ite set of fore­casts. The main con­clu­sion is that the com­pos­ite set of fore­casts can yield lower mean-square er­ror than ei­ther of the origi­nal fore­casts. Past er­rors of each of the origi­nal fore­casts are used to de­ter­mine the weights to at­tach to these two origi­nal fore­casts in form­ing the com­bined fore­casts, and differ­ent meth­ods of de­riv­ing these weights are ex­am­ined”.

  • How to build your own weather fore­cast­ing model. Sailors re­al­ize that weather fore­cast­ing are of­ten cor­rupted by differ­ent con­sid­er­a­tions (e.g., a re­ported 50% of rain doesn’t hap­pen 50% of the time), and search for bet­ter sources. One such source is the origi­nal, raw data used to gen­er­ate weather fore­casts: GRIB files (Gridded In­for­ma­tion in Bi­nary), which lack in­ter­pre­ta­tion. But these have their own pit­falls, which sailors must learn to take into ac­count. For ex­am­ple, GRIB files only take into ac­count wind speed, not tidal ac­cel­er­a­tion, which can cause a sig­nifi­cant in­crease in ap­par­ent wind.

‘Fore­casts are in­her­ently poli­ti­cal,’ says Dashew. ‘They are the re­sult of peo­ple per­haps get­ting it wrong at some point so some pres­sures to in­ter­pret them in a differ­ent or more con­ser­va­tive way very of­ten. Th­ese pres­sures change all the time so they are of­ten sub­ject to out­side fac­tors.’

Sin­gle­ton says he un­der­stands how pres­sures on fore­cast­ers can lead to this opinion be­ing formed: ‘In my days at the Met Office when the Ship­ping Fore­cast used to work un­der me, they always said they try to tell it like it is and they do not try to make it sound worse.’

  • Fore­cast­ing the div­i­dends of con­flict pre­ven­tion from 2020 − 2030. Study quan­tifies the dy­nam­ics of con­flict, build­ing a tran­si­tion ma­trix be­tween differ­ent states (peace, high risk, nega­tive peace, war, and re­cov­ery) and val­i­dat­ing it us­ing his­tor­i­cal dataset; they find (con­cur­ring with the pre­vi­ous liter­a­ture), that coun­tries have a ten­dency to fall into cy­cles of con­flict. They con­clude that chang­ing this tran­si­tion ma­trix would have a very high im­pact. Warn­ing: ex­ten­sive quot­ing fol­lows.

Notwith­stand­ing the man­date of the United Na­tions to pro­mote peace and se­cu­rity, many mem­ber states are still scep­ti­cal about the div­i­dends of con­flict pre­ven­tion. Their diplo­mats ar­gue that it is hard to jus­tify in­vest­ments with­out be­ing able to show its tan­gible re­turns to de­ci­sion-mak­ers and tax­pay­ers. As a re­sult, sup­port for con­flict pre­ven­tion is halt­ing and un­even, and gov­ern­ments and in­ter­na­tional agen­cies end up spend­ing enor­mous sums in sta­bil­ity and peace sup­port op­er­a­tions af­ter-the-fact.

This study con­sid­ers the tra­jec­to­ries of armed con­flict in a ‘busi­ness-as-usual’ sce­nario be­tween 2020-2030. Speci­fi­cally, it draws on a com­pre­hen­sive his­tor­i­cal dataset to de­ter­mine the num­ber of coun­tries that might ex­pe­rience ris­ing lev­els of col­lec­tive vi­o­lence, out­right armed con­flict, and their as­so­ci­ated eco­nomic costs. It then simu­lates al­ter­na­tive out­comes if con­flict pre­ven­tion mea­sures were 25%, 50%, and 75% more effec­tive. As with all pro­jec­tions, the qual­ity of the pro­jec­tions re­lies on the in­tegrity of the un­der­ly­ing data. The study re­views sev­eral limi­ta­tions of the anal­y­sis, and un­der­lines the im­por­tance of a cau­tious in­ter­pre­ta­tion of the find­ings.

If cur­rent trends per­sist and no ad­di­tional con­flict pre­ven­tion ac­tion is taken above the cur­rent baseline, then it is ex­pected that there will be three more coun­tries at war and nine more coun­tries at high risk of war by 2030 as com­pared to 2020. This trans­lates into roughly 677,250 con­flict-re­lated fatal­ities (civilian and bat­tle-deaths) be­tween the pre­sent and 2030. By con­trast, un­der our most pes­simistic sce­nario, a 25% in­crease in effec­tive­ness of con­flict pre­ven­tion would re­sult in 10 more coun­tries at peace by 2030, 109,000 fewer fatal­ities over the next decade and sav­ings of over $3.1 trillion. A 50% im­prove­ment would re­sult in 17 ad­di­tional coun­tries at peace by 2030, 205,000 fewer deaths by 2030, and some $6.6 trillion in sav­ings.

Mean­while, un­der our most op­ti­mistic sce­nario, a 75% im­prove­ment in pre­ven­tion would re­sult in 23 more coun­tries at peace by 2030, re­sult­ing in 291,000 lives saved over the next decade and $9.8 trillion in sav­ings. Th­ese sce­nar­ios are ap­prox­i­ma­tions, yet demon­strate con­crete and defen­si­ble es­ti­mates of both the benefits (saved lives, dis­place­ment avoided, de­clin­ing peace­keep­ing de­ploy­ments) and cost-effec­tive­ness of pre­ven­tion (re­cov­ery aid, peace­keep­ing ex­pen­di­tures). Wars are costly and the avoidance of “con­flict traps” could save the econ­omy trillions of dol­lars by 2030 un­der the most op­ti­mistic sce­nar­ios. The bot­tom line is that com­par­a­tively mod­est in­vest­ments in pre­ven­tion can yield last­ing effects by avoid­ing com­pound­ing costs of lost life, peace­keep­ing, and aid used for hu­man­i­tar­ian re­sponse and re­build­ing rather than de­vel­op­ment. The longer con­flict pre­ven­tion is de­layed, the more ex­pen­sive re­sponses to con­flict be­come.

In or­der to es­ti­mate the div­i­dends of con­flict pre­ven­tion we an­a­lyze vi­o­lence dy­nam­ics in over 190 coun­tries over the pe­riod 1994 to 2017, a time pe­riod for which most data was available for most coun­tries. Draw­ing on 12 risk vari­ables, the model ex­am­ines the like­li­hood that a war will oc­cur in a coun­try in the fol­low­ing year and we es­ti­mate (through lin­ear, fixed effects re­gres­sions) the av­er­age cost of war (and other ‘states’, de­scribed be­low) on 8 de­pen­dent vari­ables, in­clud­ing loss of life, dis­place­ment, peace­keep­ing de­ploy­ments and ex­pen­di­tures, over­sea aid and eco­nomic growth. The es­ti­mates con­firm that, by far, the most costly state for a coun­try to be in is war, and the prob­a­bil­ity of a coun­try suc­cumb­ing to war in the next year is based on its cur­rent state and the fre­quency of other coun­tries with similar states hav­ing en­tered war in the past.

At the core of the model (and re­sults) is the re­al­ity that coun­tries tend to get stuck in so-called vi­o­lence and con­flict traps. A well-es­tab­lished find­ing in the con­flict stud­ies field is that once a coun­try ex­pe­riences an armed con­flict, it is very likely to re­lapse into con­flict or vi­o­lence within a few years. Fur­ther­more, coun­tries likely to ex­pe­rience war share some com­mon warn­ing signs, which we re­fer to as “flags” (up to 12 flags can be raised to sig­nal risk). Not all coun­tries that en­ter armed con­flict raise the same warn­ing flags, but the warn­ing flags are nev­er­the­less a good in­di­ca­tion that a coun­try is at high risk. Th­ese effects cre­ate vi­cious cy­cles that re­sult in high risk, war and fre­quent re­lapse into con­flict. Mul­ti­ple forms of pre­ven­tion are nec­es­sary to break these cy­cles. The model cap­tures the vi­cious cy­cle of con­flict traps, through in­tro­duc­ing five states and a tran­si­tion ma­trix based on his­tor­i­cal data (see Table 1). First, we as­sume that a coun­try is in one of five ‘states’ in any given year. Th­ese ‘states’ are at “Peace”, “High Risk”, “Nega­tive Peace”, “War” and “Re­cov­ery” (each state is de­scribed fur­ther be­low). Draw­ing on his­tor­i­cal data, the model as­sesses the prob­a­bil­ity of a coun­try tran­si­tion­ing to an­other state in a given year (a tran­si­tion ma­trix).

For ex­am­ple, if a state was at High Risk in the last year, it has a 19.3% chance of tran­si­tion­ing to Peace, a 71.4% chance of stay­ing High Risk, a 7.6% chance of en­ter­ing Nega­tive Peace and a 1.7% chance of en­ter­ing War the fol­low­ing year.

By con­trast, high risk states are des­ig­nated by the rais­ing of up to 12 flags. Th­ese in­clude: 1) high scores by Amnesty In­ter­na­tional’s an­nual hu­man rights re­ports (source: Poli­ti­cal Ter­ror Scale), 2) the US State Depart­ment an­nual re­ports (source: Poli­ti­cal Ter­ror Scale), 3) civilian fatal­ities as a per­centage of pop­u­la­tion (source: ACLED), 4) poli­ti­cal events per year (source: ACLED) 5) events at­tributed to the pro­lifer­a­tion of non-state ac­tors (source: ACLED), 6) bat­tle deaths (source: UCDP), 7) deaths by ter­ror­ism (source: GTD), 8) high lev­els of crime (source: UNODC), 9) high lev­els of prison pop­u­la­tion (source: UNODC), 10) eco­nomic growth shocks (source: World Bank), 11) dou­bling of dis­place­ment in a year (source: IDMC), and 12) dou­bling of re­fugees in a year (source: UNHCR). Coun­tries with two or more flags fall into the “high risk” cat­e­gory. Us­ing these flags, a ma­jor­ity of coun­tries have been at high risk for one or more years from 1994 to 2017, so it is eas­ier to give ex­am­ples of coun­tries that have not been at high risk.

Nega­tive peace states are defined by com­bined scores from Amnesty In­ter­na­tional and the US State Depart­ment. Coun­tries in nega­tive peace are more than five times as likely to en­ter high risk in the fol­low­ing year than peace (26.8% vs. 4.1%).

A coun­try that is at war is one that falls into a higher thresh­old of col­lec­tive vi­o­lence, rel­a­tive to the size of the pop­u­la­tion. Speci­fi­cally, it is des­ig­nated as such if one or more of the fol­low­ing con­di­tions are met: above 0.04 bat­tle deaths or .04 civilian fatal­ities per 100,000 ac­cord­ing to UCDP and ACLED, re­spec­tively, or cod­ing of geno­cide by the Poli­ti­cal In­sta­bil­ity Task Force Wor­ld­wide Atroc­i­ties Dataset. Coun­tries ex­pe­rienc­ing five or more years of war be­tween 1994 and 2017 in­cluded Afghanistan, So­ma­lia, Su­dan, Iraq, Bu­rundi, Cen­tral Afri­can Repub­lic, Sri Lanka, DR Congo, Uganda, Chad, Colom­bia, Is­rael, Le­banon, Libe­ria, Ye­men, Alge­ria, An­gola, Sierra Leone, South Su­dan, Eritrea and Libya.

Lastly, re­cov­ery is a pe­riod of sta­bil­ity that fol­lows from war. A coun­try is only de­ter­mined to be re­cov­er­ing if it is not at war and was re­cently in a war. Any coun­try that ex­its in the war state is im­me­di­ately coded as be­ing in re­cov­ery for the fol­low­ing five years, un­less it re­lapses into war. The du­ra­tion of the re­cov­ery pe­riod (five years) is in­formed by the work of Paul Col­lier et al, but is ro­bust also to sen­si­tivity tests around vary­ing re­cov­ery lengths.

The model does not al­low for coun­tries to be high risk and in re­cov­ery in the same year, but there is am­ple ev­i­dence that coun­tries that are leav­ing a war state are at a sub­stan­tially higher risk of ex­pe­rienc­ing war re­cur­rence, con­tribut­ing to the con­flict trap de­scribed ear­lier. Coun­tries are twice as likely to en­ter high risk or nega­tive peace com­ing out of re­cov­ery as they are to en­ter peace, and 10.2% of coun­tries in re­cov­ery re­lapse into war ev­ery year. When a coun­try has passed the five year thresh­old with­out re­vert­ing to war, it can move back to states of peace, nega­tive peace or high risk.

The tran­si­tion ma­trix un­der­lines the very real risk of coun­tries fal­ling into a ‘con­flict trap’. Speci­fi­cally, a coun­try that is in a state of war has a very high like­li­hood of stay­ing in this con­di­tion in the next year (72.6%) and just a 27.4% chance of tran­si­tion­ing to re­cov­ery. Once in re­cov­ery, a coun­try has a 10.2% chance of re­lapse ev­ery year, sug­gest­ing only a 58% chance (1-10.2%)^5 that a coun­try will not re­lapse over five years.

As Col­lier and oth­ers have ob­served, coun­tries are of­ten caught in pro­longed and vi­cious cy­cles of war and re­cov­ery (con­flict traps), of­ten un­able to es­cape into a new, more peace­ful (or less war-like) state

  • War is ex­pen­sive. So is be­ing at high risk of war.

Of course, the loss of life, dis­place­ment, and ac­cu­mu­lated mis­ery as­so­ci­ated with war should be rea­son enough to in­vest in pre­ven­tion, but there are also mas­sive eco­nomic benefits from suc­cess­ful pre­ven­tion. Fore­most, the coun­tries at war avoid the costly years in con­flict, with growth rates 4.8% lower than coun­tries at peace. They also avoid years of re­cov­ery and the risk of re­lapse into con­flict. Where pre­ven­tion works, con­flict-driven hu­man­i­tar­ian needs are re­duced, and the in­ter­na­tional com­mu­nity avoids peace­keep­ing de­ploy­ments and ad­di­tional aid bur­dens, which are siz­able.

Con­clu­sion: The world can be sig­nifi­cantly bet­ter off by ad­dress­ing the high risk of de­struc­tive vi­o­lence and war with fo­cused efforts at pre­ven­tion in coun­tries at high risk and those in nega­tive peace. This group of coun­tries has his­tor­i­cally been at risk of higher con­flict due to vi­o­lence against civili­ans, pro­lifer­a­tion of armed groups, abuses of hu­man rights, forced dis­place­ment, high homi­cide, and in­ci­dence of t er­ror. None of this is sur­pris­ing. Poli­cy­mak­ers know that war is bad for hu­mans and other liv­ing things. What is stag­ger­ing is the an­nual costs of war that we will con­tinue to pay in 2030 through in­ac­tion to­day – con­ceiv­ably trillions of dol­lars of eco­nomic growth, and the as­so­ci­ated costs of this for hu­man se­cu­rity and de­vel­op­ment, are be­ing swept off t he table by the de­ci­sions made to­day to ig­nore pre­ven­tion.

On the one hand, Nas­sim Taleb has clearly ex­pressed that mea­sures to stop the spread of the pan­demic must be taken as soon as pos­si­ble: in­stead of look­ing at data, it is the na­ture of a pan­demic with a pos­si­bil­ity of dev­as­tat­ing hu­man im­pact that should drive our de­ci­sions.

On the other hand, John Ioan­ni­dis ac­knowl­edges the difficulty in hav­ing good data and of pro­duc­ing ac­cu­rate fore­casts, while be­liev­ing that even­tu­ally any in­for­ma­tion that can be ex­tracted from such data and fore­casts should still be use­ful, e.g. to hav­ing tar­geted lock­downs (in space, time, and con­sid­er­ing the vary­ing risk for differ­ent seg­ments of the pop­u­la­tion).

  • Taleb: On sin­gle point fore­casts for fat tailed vari­ables. Leit­mo­tiv: Pan­demics are fat-tailed.

We do not need more ev­i­dence un­der fat tailed dis­tri­bu­tions — it is there in the prop­er­ties them­selves (prop­er­ties for which we have am­ple ev­i­dence) and these clearly rep­re­sent risk that must be kil­led in the egg (when it is still cheap to do so). Se­condly, un­re­li­able data — or any source of un­cer­tainty — should make us fol­low the most para­noid route. [...] more un­cer­tainty in a sys­tem makes pre­cau­tion­ary de­ci­sions very easy to make (if I am un­cer­tain about the skills of the pi­lot, I get off the plane).

Ran­dom vari­ables in the power law class with tail ex­po­nent α ≤ 1 are, sim­ply, not fore­castable. They do not obey the [Law of Large Num­bers]. But we can still un­der­stand their prop­er­ties.

As a mat­ter of fact, ow­ing to preasymp­totic prop­er­ties, a heuris­tic is to con­sider vari­ables with up to α ≤ 52 as not fore­castable — the mean will be too un­sta­ble and re­quires way too much data for it to be pos­si­ble to do so in rea­son­able time. It takes 1014 ob­ser­va­tions for a “Pareto 80/​20” (the most com­monly referred to prob­a­bil­ity dis­tri­bu­tion, that is with α ≈ 1.13) for the av­er­age thus ob­tained to em­u­late the sig­nifi­cance of a Gaus­sian with only 30 ob­ser­va­tions.

  • Ioan­ni­dis: Fore­cast­ing for COVID-19 has failed. Leit­mo­tiv: “In­vest­ment should be made in the col­lec­tion, clean­ing and cu­ra­tion of data”.

Pre­dic­tions for hos­pi­tal and ICU bed re­quire­ments were also en­tirely mis­in­form­ing. Public lead­ers trusted mod­els (some­times even black boxes with­out dis­closed method­ol­ogy) in­fer­ring mas­sively over­whelmed health care ca­pac­ity (Table 1) [3]. How­ever, even­tu­ally very few hos­pi­tals were stressed, for a cou­ple of weeks. Most hos­pi­tals main­tained largely empty wards, wait­ing for tsunamis that never came. The gen­eral pop­u­la­tion was locked and placed in hor­ror-alert to save the health sys­tem from col­laps­ing. Trag­i­cally, many health sys­tems faced ma­jor ad­verse con­se­quences, not by COVID-19 cases over­load, but for very differ­ent rea­sons. Pa­tients with heart at­tacks avoided vis­it­ing hos­pi­tals for care [4], im­por­tant treat­ments (e.g. for can­cer) were un­jus­tifi­ably de­layed [5], men­tal health suffered [6]. With dam­aged op­er­a­tions, many hos­pi­tals started los­ing per­son­nel, re­duc­ing ca­pac­ity to face fu­ture crises (e.g. a sec­ond wave). With mas­sive new un­em­ploy­ment, more peo­ple may lose health in­surance. The prospects of star­va­tion and of lack of con­trol for other in­fec­tious dis­eases (like tu­ber­cu­lo­sis, malaria, and child­hood com­mu­ni­ca­ble dis­eases for which vac­ci­na­tion is hin­dered by the COVID-19 mea­sures) are dire...

The core ev­i­dence to sup­port “flat­ten-the-curve” efforts was based on ob­ser­va­tional data from the 1918 Span­ish flu pan­demic on 43 US cities. Th­ese data are >100-years old, of ques­tion­able qual­ity, un­ad­justed for con­founders, based on ecolog­i­cal rea­son­ing, and per­tain­ing to an en­tirely differ­ent (in­fluenza) pathogen that had ~100-fold higher in­fec­tion fatal­ity rate than SARS-CoV-2. Even thus, the im­pact on re­duc­tion on to­tal deaths was of bor­der­line sig­nifi­cance and very small (10-20% rel­a­tive risk re­duc­tion); con­versely many mod­els have as­sumed 25-fold re­duc­tion in deaths (e.g. from 510,000 deaths to 20,000 deaths in the Im­pe­rial Col­lege model) with adopted measures

De­spite these ob­vi­ous failures, epi­demic fore­cast­ing con­tinued to thrive, per­haps be­cause vastly er­ro­neous pre­dic­tions typ­i­cally lacked se­ri­ous con­se­quences. Ac­tu­ally, er­ro­neous pre­dic­tions may have been even use­ful. A wrong, dooms­day pre­dic­tion may in­cen­tivize peo­ple to­wards bet­ter per­sonal hy­giene. Prob­lems starts when pub­lic lead­ers take (wrong) pre­dic­tions too se­ri­ously, con­sid­er­ing them crys­tal balls with­out un­der­stand­ing their un­cer­tainty and the as­sump­tions made. Slaugh­ter­ing mil­lions of an­i­mals in 2001 ag­gra­vated a few an­i­mal busi­ness stake­hold­ers, most cit­i­zens were not di­rectly af­fected. How­ever, with COVID-19, es­poused wrong pre­dic­tions can dev­as­tate billions of peo­ple in terms of the econ­omy, health, and so­cietal tur­moil at-large.

Cirillo and Taleb thought­fully ar­gue [14] that when it comes to con­ta­gious risk, we should take dooms­day pre­dic­tions se­ri­ously: ma­jor epi­demics fol­low a fat-tail pat­tern and ex­treme value the­ory be­comes rele­vant. Ex­am­in­ing 72 ma­jor epi­demics recorded through his­tory, they demon­strate a fat-tailed mor­tal­ity im­pact. How­ever, they an­a­lyze only the 72 most no­ticed out­breaks, a sam­ple with as­tound­ing se­lec­tion bias. The most fa­mous out­breaks in hu­man his­tory are prefer­en­tially se­lected from the ex­treme tail of the dis­tri­bu­tion of all out­breaks. Tens of mil­lions of out­breaks with a cou­ple deaths must have hap­pened through­out time. Prob­a­bly hun­dreds of thou­sands might have claimed dozens of fatal­ities. Thou­sands of out­breaks might have ex­ceeded 1,000 fatal­ities. Most eluded the his­tor­i­cal record. The four gar­den va­ri­ety coro­n­aviruses may be caus­ing such out­breaks ev­ery year [15,16]. One of them, OC43 seems to have been in­tro­duced in hu­mans as re­cently as 1890, prob­a­bly caus­ing a “bad in­fluenza year” with over a mil­lion deaths [17]. Based on what we know now, SARS-CoV-2 may be closer to OC43 than SARS-CoV-1. This does not mean it is not se­ri­ous: its ini­tial hu­man in­tro­duc­tion can be highly lethal, un­less we pro­tect those at risk.

  • The (Bri­tish) Royal Eco­nomic So­ciety pre­sents a panel on What is a sce­nario, pro­jec­tion and a fore­cast—how good or use­ful are they par­tic­u­larly now?. The start seems promis­ing: “My pro­fes­sional en­gage­ment with eco­nomic and fis­cal fore­cast­ing was first as a con­sumer, and then a pro­ducer. I spent a decade hap­pily mock­ing other peo­ple’s efforts, as a jour­nal­ist, since when I’ve spent two decades helping col­leagues to con­struct fore­casts and to try to ex­plain them to the pub­lic.” The first speaker, which cor­re­sponds to the first ten min­utes, is worth listen­ing to; the rest varies in qual­ity.

You have to con­struct the fore­cast and ex­plain it in a way that’s fit for that purpose

  • I liked the fol­low­ing tax­on­omy of what dis­tinct tar­gets the agency the first speaker works for is aiming to hit with their fore­casts:

    1. as an in­put into the policy-mak­ing pro­cess,

    2. as a trans­par­ent as­sess­ment of pub­lic finances

    3. as a pre­dic­tion of whether the gov­ern­ment will meet what­ever fis­cal rules it has set it­self,

    4. as a baseline against which to judge the sig­nifi­cance of fur­ther news,

    5. as a challenge to other agen­cies “to keep the bas­tards hon­est”.

  • The limi­ta­tions were in­ter­est­ing as well:

    1. they re­quire us to pro­duce a fore­cast that’s con­di­tioned on cur­rent gov­ern­ment policy even if we and ev­ery­one else ex­pect that policy to change that of course makes it hard to bench­mark our perfor­mance against coun­ter­parts who are pro­duc­ing un­con­di­tional fore­casts.

    2. The fore­casts have to be ex­plain­able; a black box model might be more ac­cu­rate but be less use­ful.

    3. they re­quire de­tailed dis­cus­sion of the in­di­vi­d­ual fore­cast lines and clear di­ag­nos­tics to ex­plain changes from one fore­cast to the next pre­cisely to re­as­sure peo­ple that those changes aren’t poli­ti­cally mo­ti­vated or tainted—the fore­cast is as much about de­liv­er­ing trans­parency and ac­countabil­ity as about demon­strat­ing pre­dic­tive prowess

    4. the fore­cast num­bers re­ally have to be ac­com­panied by a com­pre­hen­si­ble nar­ra­tive of what is go­ing on in the econ­omy and the pub­lic fi­nances and what im­pact policy will have—Par­li­a­ment and the pub­lic needs to be able to en­gage with the fore­cast we couldn’t jus­tify our pre­dic­tions sim­ply with an ap­peal to a statis­ti­cal black box and the Chan­cel­lor cer­tainly couldn’t jus­tify sig­nifi­cant policy po­si­tions that way.

“horses for courses, the way you do the fore­cast, the way you pre­sent it de­pends on what you’re try­ing to achieve with it”

“Peo­ple use sce­nario fore­cast­ing in a very in­for­mal man­ner. which I think that could be prob­le­matic be­cause it’s very difficult to ba­si­cally find out what are the as­sump­tions and whether those as­sump­tions and the mod­els and the laws can be val­i­dated”

Lin­ear mod­els are state in­de­pen­dent, but it’s not the same to re­ceive a shock where the econ­omy is in up­swing as when the econ­omy is dur­ing a re­ces­sion.

  • Some situ­a­tions are too com­pli­cated to fore­cast, so one con­di­tions on some vari­ables be­ing known, or fol­low­ing a given path, and then stud­ies the rest, call­ing the out­put a “sce­nario.”

One week de­lay in in­ter­ven­tion by the gov­ern­ment makes a big differ­ence to the height of the [covid-19] curve.

I don’t think it’s easy to fol­low the old way of do­ing things. I’m sorry, I have to be hon­est with you. I spent 4 months just think­ing about this prob­lem and you need to in­te­grate a model of the so­cial be­hav­ior and how you deal with the risk to health and to econ­omy in these mod­els. But un­for­tu­nately, by the time we do that it won’t be rele­vant.

It amuses me to look at weather fore­casts be­cause economists don’t have that kind of tech­nol­ogy, those kind of re­sources.

Note to the fu­ture: All links are added au­to­mat­i­cally to the In­ter­net Archive. In case of link rot, go here

“horses for courses, the way you do the fore­cast, the way you pre­sent it de­pends on what you’re try­ing to achieve with it”