Technical AGI safety research outside AI

I think there are many ques­tions whose an­swers would be use­ful for tech­ni­cal AGI safety re­search, but which will prob­a­bly re­quire ex­per­tise out­side AI to an­swer. In this post I list 30 of them, di­vided into four cat­e­gories. Feel free to get in touch if you’d like to dis­cuss these ques­tions and why I think they’re im­por­tant in more de­tail. I per­son­ally think that mak­ing progress on the ones in the first cat­e­gory is par­tic­u­larly vi­tal, and plau­si­bly tractable for re­searchers from a wide range of aca­demic back­grounds.

Study­ing and un­der­stand­ing safety problems

  1. How strong are the eco­nomic or tech­nolog­i­cal pres­sures to­wards build­ing very gen­eral AI sys­tems, as op­posed to nar­row ones? How plau­si­ble is the CAIS model of ad­vanced AI ca­pa­bil­ities aris­ing from the com­bi­na­tion of many nar­row ser­vices?

  2. What are the most com­pel­ling ar­gu­ments for and against dis­con­tin­u­ous ver­sus con­tin­u­ous take­offs? In par­tic­u­lar, how should we think about the anal­ogy from hu­man evolu­tion, and the scal­a­bil­ity of in­tel­li­gence with com­pute?

  3. What are the tasks via which nar­row AI is most likely to have a desta­bil­is­ing im­pact on so­ciety? What might cy­ber crime look like when many im­por­tant jobs have been au­to­mated?

  4. How plau­si­ble are safety con­cerns about eco­nomic dom­i­nance by in­fluence-seek­ing agents, as well as struc­tural loss of con­trol sce­nar­ios? Can these be re­for­mu­lated in terms of stan­dard eco­nomic ideas, such as prin­ci­pal-agent prob­lems and the effects of au­toma­tion?

  5. How can we make the con­cepts of agency and goal-di­rected be­havi­our more spe­cific and use­ful in the con­text of AI (e.g. build­ing on Den­nett’s work on the in­ten­tional stance)? How do they re­late to in­tel­li­gence and the abil­ity to gen­er­al­ise across widely differ­ent do­mains?

  6. What are the strongest ar­gu­ments that have been made about why ad­vanced AI might pose an ex­is­ten­tial threat, stated as clearly as pos­si­ble? How do the differ­ent claims re­late to each other, and which in­fer­ences or as­sump­tions are weak­est?

Solv­ing safety problems

  1. What tech­niques used in study­ing an­i­mal brains and be­havi­our will be most helpful for analysing AI sys­tems and their be­havi­our, par­tic­u­larly with the goal of ren­der­ing them in­ter­pretable?

  2. What is the most im­por­tant in­for­ma­tion about de­ployed AI that de­ci­sion-mak­ers will need to track, and how can we cre­ate in­ter­faces which com­mu­ni­cate this effec­tively, mak­ing it visi­ble and salient?

  3. What are the most effec­tive ways to gather huge num­bers of hu­man judg­ments about po­ten­tial AI be­havi­our, and how can we en­sure that such data is high-qual­ity?

  4. How can we em­piri­cally test the de­bate and fac­tored cog­ni­tion hy­pothe­ses? How plau­si­ble are the as­sump­tions about the de­com­pos­abil­ity of cog­ni­tive work via lan­guage which un­der­lie de­bate and iter­ated dis­til­la­tion and am­plifi­ca­tion?

  5. How can we dis­t­in­guish be­tween AIs helping us bet­ter un­der­stand what we want and AIs chang­ing what we want (both as in­di­vi­d­u­als and as a civil­i­sa­tion)? How easy is the lat­ter to do; and how easy is it for us to iden­tify?

  6. Var­i­ous ques­tions in de­ci­sion the­ory, log­i­cal un­cer­tainty and game the­ory rele­vant to agent foun­da­tions.

  7. How can we cre­ate se­cure con­tain­ment and su­per­vi­sion pro­to­cols to use on AI, which are also ro­bust to ex­ter­nal in­terfer­ence?

  8. What are the best com­mu­ni­ca­tion chan­nels for con­vey­ing goals to AI agents? In par­tic­u­lar, which ones are most likely to in­cen­tivise op­ti­mi­sa­tion of the goal speci­fied through the chan­nel, rather than mod­ifi­ca­tion of the com­mu­ni­ca­tion chan­nel it­self?

  9. How closely linked is the hu­man mo­ti­va­tional sys­tem to our in­tel­lec­tual ca­pa­bil­ities—to what ex­tent does the or­thog­o­nal­ity the­sis ap­ply to hu­man-like brains? What can we learn from the range of vari­a­tion in hu­man mo­ti­va­tional sys­tems (e.g. in­duced by brain di­s­or­ders)?

  10. What were the fea­tures of the hu­man an­ces­tral en­vi­ron­ment and evolu­tion­ary “train­ing pro­cess” that con­tributed the most to our em­pa­thy and al­tru­ism? What are the analogues of these in our cur­rent AI train­ing se­tups, and how can we in­crease them?

  11. What are the fea­tures of our cur­rent cul­tural en­vi­ron­ments that con­tribute the most to al­tru­is­tic and co­op­er­a­tive be­havi­our, and how can we repli­cate these while train­ing AI?

Fore­cast­ing AI

  1. What are the most likely path­ways to AGI and the mile­stones and timelines in­volved?

  2. How do our best sys­tems so far com­pare to an­i­mals and hu­mans, both in terms of perfor­mance and in terms of brain size? What do we know from an­i­mals about how cog­ni­tive abil­ities scale with brain size, learn­ing time, en­vi­ron­men­tal com­plex­ity, etc?

  3. What are the eco­nomics and lo­gis­tics of build­ing microchips and dat­a­cen­ters? How will the availa­bil­ity of com­pute change un­der differ­ent de­mand sce­nar­ios?

  4. In what ways is AI use­fully analo­gous or dis­analo­gous to the in­dus­trial rev­olu­tion; elec­tric­ity; and nu­clear weapons?

  5. How will the pro­gres­sion of nar­row AI shape pub­lic and gov­ern­ment opinions and nar­ra­tives to­wards it, and how will that in­fluence the di­rec­tions of AI re­search?

  6. Which tasks will there be most eco­nomic pres­sure to au­to­mate, and how much money might re­al­is­ti­cally be in­volved? What are the biggest so­cial or le­gal bar­ri­ers to au­toma­tion?

  7. What are the most salient fea­tures of the his­tory of AI, and how should they af­fect our un­der­stand­ing of the field to­day?


  1. How can we best grow the field of AI safety? See OpenPhil’s notes on the topic.

  2. How can spread norms in favour of care­ful, ro­bust test­ing and other safety mea­sures in ma­chine learn­ing? What can we learn from other en­g­ineer­ing dis­ci­plines with strict stan­dards, such as aerospace en­g­ineer­ing?

  3. How can we cre­ate in­fras­truc­ture to im­prove our abil­ity to ac­cu­rately pre­dict fu­ture de­vel­op­ment of AI? What are the bot­tle­necks fac­ing tools like Fore­ and Me­tac­u­lus, and pre­vent­ing effec­tive pre­dic­tion mar­kets from ex­ist­ing?

  4. How can we best in­crease com­mu­ni­ca­tion and co­or­di­na­tion within the AI safety com­mu­nity? What are the ma­jor con­straints that safety faces on shar­ing in­for­ma­tion (in par­tic­u­lar ones which other fields don’t face), and how can we over­come them?

  5. What norms and in­sti­tu­tions should the field of AI safety im­port from other dis­ci­plines? Are there pre­dictable prob­lems that we will face as a re­search com­mu­nity, or sys­temic bi­ases which are mak­ing us over­look things?

  6. What are the biggest dis­agree­ments be­tween safety re­searchers? What’s the dis­tri­bu­tion of opinions, and what are the key cruxes?

Par­tic­u­lar thanks to Beth Barnes and a dis­cus­sion group at the CHAI re­treat for helping me com­pile this list.