State Space of X-Risk Trajectories

Justin Shov­e­lain de­vel­oped the core ideas in this ar­ti­cle and as­sisted in writ­ing, David Kristoffers­son was the lead writer and ed­i­tor.

Cross-posted on the EA Forum

Abstract

Cur­rently, peo­ple tend to use many key con­cepts in­for­mally when rea­son­ing and form­ing strate­gies and poli­cies for ex­is­ten­tial risks (x-risks). A well-defined for­mal­iza­tion and graph­i­cal lan­guage for paths and choices would help us pin down more ex­actly what we think and let us see re­la­tions and con­trasts more eas­ily. We con­struct a com­mon state space for fu­tures, tra­jec­to­ries, and in­ter­ven­tions, and show how these in­ter­act. The space gives us a pos­si­ble be­gin­ning of a more pre­cise lan­guage for rea­son­ing and com­mu­ni­cat­ing about the tra­jec­tory of hu­man­ity and how differ­ent de­ci­sions may af­fect it.

Introduction

Un­der­stand­ing the pos­si­ble tra­jec­to­ries of hu­man civ­i­liza­tion and the fu­tures they im­ply is key to steer­ing de­vel­op­ment to­wards safe and benefi­cial out­comes. The tra­jec­tory of hu­man civ­i­liza­tion will be highly im­pacted by the de­vel­op­ment of ad­vanced tech­nol­ogy, such as syn­thetic biol­ogy, nan­otech­nol­ogy, or ar­tifi­cial gen­eral in­tel­li­gence (AGI). Re­duc­ing ex­is­ten­tial risks means in­ter­ven­ing on the tra­jec­tory civ­i­liza­tion takes. To iden­tify effec­tive in­ter­ven­tions, we need to be able to an­swer ques­tions like: how close are we to good and bad out­comes? How prob­a­ble are the good and bad out­comes? What lev­ers in the sys­tem could al­low us to shape its tra­jec­tory?

Pre­vi­ous work has mod­eled some as­pects of ex­is­ten­tial risk tra­jec­to­ries. For ex­am­ple, var­i­ous sur­veys and ex­trap­o­la­tions have been made to fore­cast AGI timelines [1; 2; 3; 4; 5], and sce­nario and risk mod­el­ing have pro­vided frame­works for some as­pects of risks and in­ter­ven­tions [6; 7; 8]. The pa­per Long Term Tra­jec­to­ries of Hu­man Civ­i­liza­tion [9] offers one form of vi­su­al­iza­tion of civ­i­liza­tional tra­jec­to­ries. How­ever, so far none of these works have defined a unified graph­i­cal frame­work for fu­tures, tra­jec­to­ries, and in­ter­ven­tions. Without de­vel­op­ing more and bet­ter in­tel­lec­tual tools to ex­am­ine the pos­si­ble tra­jec­to­ries of the world and how to shape them, we are likely to re­main con­fused about many of the re­quire­ments for reach­ing a flour­ish­ing fu­ture, caus­ing us to take less effec­tive ac­tions and leav­ing us ill-pre­pared for the events ahead of us.

We con­struct a state space model of ex­is­ten­tial risk, where our close­ness to sta­bly good and bad out­comes is rep­re­sented as co­or­di­nates, ways the world could de­velop are paths in this space, our ac­tions change the shapes of these paths, and the like­li­hood of good or bad out­comes is based on how many paths in­ter­cept those out­comes. We show how this frame­work pro­vides new the­o­ret­i­cal foun­da­tions to guide the search for in­ter­ven­tions that can help steer the de­vel­op­ment of tech­nolo­gies such as AGI in a safe and benefi­cial di­rec­tion.

This is part of Con­ver­gence’s broader efforts to con­struct new tools for gen­er­at­ing, map­ping out, test­ing, and ex­plor­ing timelines, in­ter­ven­tions, and out­comes, and show­ing how these all in­ter­act. The state space of x-risk tra­jec­to­ries could form a cor­ner­stone in this larger frame­work.

State Space Model of X-Risk Trajectories

As hu­man­ity de­vel­ops ad­vanced tech­nol­ogy we want to move closer to benefi­cial out­comes, stay away from harm­ful ones, and bet­ter un­der­stand where we are. In par­tic­u­lar, we want to un­der­stand where we are in terms of ex­is­ten­tial risk from ad­vanced tech­nol­ogy. For­mal­iza­tion and vi­sual graphs can help us think about this more clearly and effec­tively. Graphs make re­la­tion­ships be­tween vari­ables clearer, al­low­ing one to take in a lot of in­for­ma­tion in a sin­gle glance, and af­ford­ing us to dis­cern var­i­ous pat­terns, like deriva­tives, os­cilla­tions, and trends. The state space model for­mal­izes x-risk tra­jec­to­ries and pro­vides us the power of vi­sual graphs. To con­struct the state space, we will define x-risk tra­jec­to­ries ge­o­met­ri­cally. Thus, we need to for­mal­ize po­si­tion, dis­tance, and tra­jec­to­ries in terms of ex­is­ten­tial risk, and we need to in­cor­po­rate un­cer­tainty.

In the state space model of ex­is­ten­tial risk, cur­rent and fu­ture states of the world are points, pos­si­ble pro­gres­sions through time are tra­jec­to­ries through these points, and sta­bly good or bad fu­tures hap­pen when any co­or­di­nate drops be­low zero (i.e. ab­sorb­ing states).

Stable fu­tures are fu­tures of ei­ther ex­is­ten­tial catas­tro­phe or ex­is­ten­tial safety. Stably good fu­tures (com­pare to Bostrom’s ‘OK out­comes’, as in [10]) are those where so­ciety has achieved enough wis­dom and co­or­di­na­tion to guaran­tee the fu­ture against ex­is­ten­tial risks and other dystopian out­comes, per­haps with the aid of Friendly AI (FAI). Stably bad fu­tures (‘bad out­comes’) are those where ex­is­ten­tial catas­tro­phe has oc­curred.

Trajectories illustration

While the frame­work pre­sented in this ar­ti­cle can be used to analyse any spe­cific ex­is­ten­tial risk, or ex­is­ten­tial risks in gen­eral, in this ar­ti­cle we will illus­trate with a sce­nario where hu­man­ity may de­velop FAI or “un­friendly” AI (UFAI). For the sim­plest vi­su­al­iza­tion of the state space, one can draw a two-di­men­sional co­or­di­nate sys­tem, and let the x-co­or­di­nates be­low 0 be the “UFAI” part of the space, and the y-co­or­di­nates be­low 0 be the “FAI” part of the space. The world will then take po­si­tions in the up­per right quad­rant, with the x-co­or­di­nate be­ing the world’s dis­tance from a UFAI fu­ture, and the y-co­or­di­nate be­ing the world’s dis­tance from an FAI fu­ture. As time pro­gresses, the world will prob­a­bly move closer to one or both of these fu­tures, trac­ing out a tra­jec­tory through this space. By un­der­stand­ing the move­ment of the world through this space, one can un­der­stand which fu­ture we are headed for (in this ex­am­ple, whether we look likely to end up in the FAI fu­ture or the UFAI fu­ture). [1]

Part of the ad­van­tage of this ap­proach is as a cog­ni­tive aid to fa­cil­i­tate bet­ter com­mu­ni­ca­tion of pos­si­ble sce­nar­ios be­tween ex­is­ten­tial risk re­searchers. For in­stance, there might be sen­si­ble (but im­plicit) var­i­ance in their es­ti­mates of cur­rent dis­tance to UFAI and/​or FAI (per­haps be­cause one re­searcher thinks UFAI will be much eas­ier to build than the other re­searcher thinks). In Fig 1, we show this as two pos­si­ble start po­si­tions. They might also agree about how an in­ter­ven­tion rep­re­sented by the black line (per­haps fund­ing a par­tic­u­lar AI safety re­search agenda) would af­fect tra­jec­to­ries. But be­cause they dis­agree on the world’s cur­rent po­si­tion in the space, they’ll dis­agree on whether that in­ter­ven­tion is enough. (Fig 1 in­di­cates the in­ter­ven­tion will not have had a chance to sub­stan­tially bend the tra­jec­tory be­fore UFAI is reached, if the world is in Start 1.) If the re­searchers can both see the AGI tra­jec­tory space like this, they can iden­tify their pre­cise point of dis­agree­ment, and thus have more chance of pro­duc­tively learn­ing from and re­solv­ing their dis­agree­ment.

Essen­tially, the state space is a way to de­scribe where the world is, where we want to go, and what we want to steer well clear of. We pro­ceed by out­lin­ing a num­ber of key con­sid­er­a­tions for the state space.

Tra­jec­to­ries of the world

We want to un­der­stand what out­comes are pos­si­ble and likely in the fu­ture. We do this by pro­ject­ing from past trends into the fu­ture and by build­ing an un­der­stand­ing of the sys­tem’s dy­nam­ics. A tra­jec­tory is a path in the state space be­tween points in the past, pre­sent, or fu­ture that may move so­ciety closer or farther away from cer­tain out­comes.

As time pro­gresses, the state of the world changes, and thus the po­si­tion of the world in the space changes. As tech­nol­ogy be­comes more ad­vanced, more ex­treme out­comes be­come pos­si­ble, and the world moves closer to both the pos­si­bil­ities of ex­is­ten­tial catas­tro­phe and ex­is­ten­tial safety. Given the plau­si­ble tra­jec­to­ries of the world, one can work out prob­a­bil­ities for the even­tual oc­cur­rence of each sta­bly good or bad fu­ture.

In the ex­am­ple, by draw­ing tra­jec­to­ries of the move­ment of the world, one can study how the world is or could be mov­ing in re­la­tion to the FAI and UFAI fu­tures. As a sim­plified illus­tra­tion of how a tra­jec­tory may be changed, say so­ciety would de­cide to stop de­vel­op­ing generic AGI ca­pa­bil­ity and would fo­cus purely and effec­tively on FAI; this may change the tra­jec­tory to a straight line mov­ing to­wards FAI, as­sum­ing it’s pos­si­ble to de­cou­ple progress on the two axes.

Defin­ing dis­tance more exactly

We need a no­tion of dis­tance that is con­ducive to mea­sure­ment, pre­dic­tion, and ac­tion. The defi­ni­tion of dis­tance is cen­tral to the mean­ing of the space and de­ter­mines much of the me­chan­ics of the model. The way we’ve de­scribed the state space thus far leaves it com­pat­i­ble with many differ­ent types of dis­tance. How­ever, in or­der to fully spec­ify the space, one needs to choose one dis­tance. Pos­si­ble in­ter­est­ing choices for defin­ing dis­tance in­clude: work hours, bits of in­for­ma­tion, com­pu­ta­tional time, ac­tual time, and prob­a­bil­is­tic dis­tance. Fur­ther, build­ing an en­sem­ble of differ­ent met­rics would al­low us to make stronger as­sess­ments.

Uncer­tainty over po­si­tions and trajectories

We need to take into ac­count un­cer­tainty to prop­erly re­flect what we know about the world and the fu­ture. There is much un­cer­tainty about how likely we are to reach var­i­ous so­cietal out­comes, whether due to un­cer­tainty about our cur­rent state, about our tra­jec­tory, or about the im­pact cer­tain in­ter­ven­tions would have. By effec­tively in­cor­po­rat­ing un­cer­tain­ties into our model, we can more clearly see what we don’t know (and what we should in­ves­ti­gate fur­ther), draw more ac­cu­rate con­clu­sions, and make bet­ter plans.

Tak­ing un­cer­tainty into ac­count means hav­ing prob­a­bil­ity dis­tri­bu­tions over po­si­tions and over the shape and speed of tra­jec­to­ries. Tech­ni­cally speak­ing, tra­jec­to­ries are rep­re­sented as prob­a­bil­ity dis­tri­bu­tions that vary with time and that are roughly speak­ing de­ter­mined by tak­ing the ini­tial prob­a­bil­ity dis­tri­bu­tion and re­peat­edly ap­ply­ing a tran­si­tion ma­trix to it. This is a stochas­tic pro­cess that would look some­thing like a ran­dom walk (such as in brow­n­ian mo­tion) drift­ing in a par­tic­u­lar di­rec­tion. (We’d also ideally have a prob­a­bil­ity func­tion over both the shape and speed of tra­jec­to­ries in a way that doesn’t treat the shape and speed as con­di­tion­ally in­de­pen­dent.) Us­ing the ex­am­ple in the ear­lier di­a­gram, we don’t know the timelines for FAI or UFAI with cer­tainty. It may be 5 or 50 years, or more, be­fore one of the sta­ble fu­tures is achieved. Per­haps so­ciety will adapt and self-cor­rect to­wards de­vel­op­ing the req­ui­site safe and benefi­cial AGI tech­nol­ogy, or per­haps safety and prepa­ra­tion will be ne­glected. Th­ese un­cer­tain­ties and events can be mod­eled in the space, with tra­jec­to­ries pass­ing through po­si­tions with var­i­ous prob­a­bil­ities.

Defin­ing how to calcu­late speed over a trajectory

We need a method to calcu­late the speed of the tra­jec­to­ries. The speed of a tra­jec­tory is as­sumed to pri­mar­ily be de­ter­mined by the rate of tech­nolog­i­cal de­vel­op­ment. Timeline mod­els al­low us to de­ter­mine the speed of tra­jec­to­ries. The con­nec­tion be­tween state space co­or­di­nates and time is in re­al­ity non triv­ial and pos­si­bly sort of jumpy (un­less smoothed out by un­cer­tainty). For ex­am­ple, an im­por­tant cause of a tra­jec­tory might be a few dis­crete in­sights, such that the world has sud­den, big lurches along that tra­jec­tory at the mo­ments when those in­sights are reached, but moves slowly at other times.

Tra­jec­to­ries as defined in the co­or­di­nate space do not have time di­rectly as­so­ci­ated with them from their shapes; in­stead, that is an im­plicit quan­tity. That is, dis­tance in the co­or­di­nate space does not cor­re­spond uniformly to dis­tance in time. The same tra­jec­tory can take more or less time to go through.

In gen­eral, there are sev­eral ways to calcu­late the ex­pected time un­til we have a cer­tain tech­nol­ogy. Ex­pert sur­veys, trend ex­trap­o­la­tion, and simu­la­tion are three use­ful tools that can be used for this pur­pose.

Extensions

We see many ways to ex­tend this re­search:

  • Shap­ing tra­jec­to­ries: Ex­tend­ing the mod­el­ing with sys­tem­ati­za­tion (start­ing by map­ping in­ter­ven­tions and con­struct­ing strate­gies) and vi­su­al­iza­tion of in­ter­ven­tions to help in bet­ter un­der­stand­ing how to shape tra­jec­to­ries in or­der to provide new ideas for in­ter­ven­tions.

  • Spe­cial­ized ver­sions: Mak­ing spe­cial­ized ver­sions of the space for each par­tic­u­lar ex­is­ten­tial risk (such as AI risk, biorisk, nu­clear war, etc.).

  • Tra­jec­to­ries and time: Fur­ther ex­am­in­ing the re­la­tion­ships be­tween the tra­jec­to­ries and time. Con­ver­gence has one math­e­mat­i­cal model for AI timelines, and there is a fam­ily of ap­proaches, each valid in cer­tain cir­cum­stances. Th­ese can be char­ac­ter­ized, simu­lated, and ver­ified, and could help in­spire in­ter­ven­tions.

  • Mea­sura­bil­ity: Fur­ther in­creas­ing the mea­sura­bil­ity of our po­si­tion and knowl­edge of the state space dy­nam­ics. We don’t know ex­actly where the world is in co­or­di­nates, how we are mov­ing, or how fast we’re go­ing, and we want to re­fine mea­sure­ment to be less of an ad hoc pro­cess and more like en­g­ineer­ing. For ex­am­ple, per­haps we can de­ter­mine how far we are away from AGI by pro­ject­ing the needed com­puter power or the rate at which we’re hav­ing AGI in­sights. Proper mea­sure­ment here is go­ing to be sub­tle, be­cause we can­not sam­ple, and don’t en­tirely know what the pos­si­ble AGI de­signs are. But by mea­sur­ing as best we can, we can ver­ify dy­nam­ics and po­si­tions more ac­cu­rately and so fine tune our strate­gies.

  • Ex­plor­ing ge­ome­tries: Ex­plor­ing vari­a­tions on the ge­om­e­try, such as with differ­ent ba­sis spaces or parametriza­tions of the state space, could provide us with new per­spec­tives. Maybe there are in­var­i­ants, sym­me­tries, bound­aries, or non-triv­ial topolo­gies that can be mod­el­led.

  • Larger spaces: Im­merse the space in larger ones, like the full ge­om­e­try of Tur­ing Machines, or state spaces that en­code things like so­cial dy­nam­ics, re­source progress, or the laws of physics. This would al­low us to track more dy­nam­ics or to see things more ac­cu­rately.

  • Re­source dis­tri­bu­tion: Us­ing the state space model as part of a greater sys­tem that helps de­ter­mine how to dis­tribute re­sources. How does one build a sys­tem that han­dles the ex­plore vs ex­ploit trade­off prop­erly, al­lows del­e­ga­tion and spe­cial­iza­tion, eval­u­ates teams and pro­jects clearly, self-im­proves in a reweight­ing way, al­lows in­ter­ven­tions with differ­ent com­ple­tion dates to be com­pared, in­cor­po­rates un­known un­knowns and hard to re­verse en­g­ineer data rich in­tu­itions cleanly, and doesn’t suffer from de­cay, Good­hart’s law, or the prin­ci­pal-agent prob­lem? Each of these ques­tions needs in­ves­ti­ga­tion.

  • Tra­jec­to­ries simu­la­tor: Im­ple­ment a soft­ware plat­form for the tra­jec­to­ries model to al­low ex­plo­ra­tion, learn­ing, and ex­per­i­men­ta­tion us­ing differ­ent sce­nar­ios and ways of modeling

Conclusion

The state space model of x-risk tra­jec­to­ries can help us think about and vi­su­al­ise tra­jec­to­ries of the world, in re­la­tion to ex­is­ten­tial risk, in a more pre­cise and struc­tured man­ner. The state space model is in­tended to be a step­ping stone to fur­ther for­mal­iza­tions and “mech­a­niza­tions” of strate­gic mat­ters on re­duc­ing ex­is­ten­tial risk. We think this kind of mind­set is rarely ap­plied to such “strate­gic” ques­tions, de­spite po­ten­tially be­ing very use­ful for them. There are of course draw­backs to this kind of ap­proach as well; in par­tic­u­lar, it won’t do much good if it isn’t cal­ibrated or com­bined with more ap­plied work. We see the po­ten­tial to build a syn­er­gis­tic set of tools to gen­er­ate, map out, test, and ex­plore timelines, in­ter­ven­tions, and out­comes, and to show how these all in­ter­act. We in­tend to fol­low this ar­ti­cle up with other for­mal­iza­tions, in­sights, and pro­ject ideas that seem promis­ing. This is a work in progress; thoughts and com­ments are most wel­come.

We wish to thank Michael Aird, An­drew Ste­wart, Jesse Lip­trap, Ozzie Gooen, Shri Sam­son, and Siebe Rozen­dal for their many helpful com­ments and sug­ges­tions on this doc­u­ment.

Bibliography

[1]. Grace, K., Sal­vatier, J., Dafoe, A., Zhang, B., & Evans, O. (2017). When Will AI Ex­ceed Hu­man Perfor­mance? Ev­i­dence from AI Ex­perts, 1–21. https://​​arxiv.org/​​abs/​​1705.08807
[2]. https://​​nick­bostrom.com/​​pa­pers/​​sur­vey.pdf
[3]. https://​​www.eff.org/​​ai/​​met­rics
[4]. http://​​the­uncer­tain­fu­ture.com
[5]. OpenPhil: What Do We Know about AI Timelines? https://​​www.open­philan­thropy.org/​​fo­cus/​​global-catas­trophic-risks/​​po­ten­tial-risks-ad­vanced-ar­tifi­cial-in­tel­li­gence/​​ai-timelines
[6]. Bar­ret, A. M., & Baum, S. D. (2016). A model of path­ways to ar­tifi­cial su­per­in­tel­li­gence catas­tro­phe for risk and de­ci­sion anal­y­sis. Jour­nal of Ex­per­i­men­tal & The­o­ret­i­cal Ar­tifi­cial In­tel­li­gence, (789541031), 1–21. https://​​doi.org/​​10.1080/​​09528130701472416
[7]. http://​​aleph.se/​​an­dart2/​​math/​​adding-cooks-to-the-broth/​​
[8]. Cot­ton-Bar­ratt, O., Daniel M., Sand­berg A. (2020). Defence in Depth Against Hu­man Ex­tinc­tion: Preven­tion, Re­sponse, Re­silience, and Why They All Mat­ter. https://​​on­linelibrary.wiley.com/​​doi/​​full/​​10.1111/​​1758-5899.12786
[9]. Seth D Baum, et al. (2019). Long-Term Tra­jec­to­ries of Hu­man Civ­i­liza­tion. http://​​gcrin­sti­tute.org/​​pa­pers/​​tra­jec­to­ries.pdf
[10]. Bostrom, N. (2013). Ex­is­ten­tial risk pre­ven­tion as global pri­or­ity. Global Policy, 4(1), 15–31. https://​​doi.org/​​10.1111/​​1758-5899.12002


  1. How does the state space of x-risk tra­jec­to­ries model com­pare to the tra­jec­tory vi­su­al­iza­tions in [9]? The axes are al­most com­pletely differ­ent. Their tra­jec­to­ries graphs have an axis for time; the state space doesn’t. Their graphs have an axis for pop­u­la­tion size; the state space doesn’t. In the state space, each axis rep­re­sents a sta­bly bad or a sta­bly good fu­ture. Though, in the vi­su­al­iza­tions in [9], hit­ting the x-axis rep­re­sents ex­tinc­tion, which maps some­what to hit­ting the axis of a sta­bly bad fu­ture in the tra­jec­to­ries model. The vi­su­al­iza­tions in [9] illus­trate valuable ideas but they seem to be less about choices or in­ter­ven­tions than the state space model is. ↩︎

No comments.