AI Safety Research Camp—Project Proposal

AI Safety Re­search Camp—Pro­ject Proposal

→ Give your feed­back on our plans be­low or in the google doc
Ap­ply to take part in the Gran Ca­naria camp on 12-22 April (dead­line: 12 Fe­bru­ary)
Join the Face­book group


Aim: Effi­ciently launch as­piring AI safety and strat­egy re­searchers into con­crete pro­duc­tivity by cre­at­ing an ‘on-ramp’ for fu­ture re­searchers.


  1. Get peo­ple started on and im­mersed into con­crete re­search work in­tended to lead to pa­pers for pub­li­ca­tion.

  2. Ad­dress the bot­tle­neck in AI safety/​strat­egy of few ex­perts be­ing available to train or or­ga­nize as­piring re­searchers by effi­ciently us­ing ex­pert time.

  3. Create a clear path from ‘in­ter­ested/​con­cerned’ to ‘ac­tive re­searcher’.

  4. Test a new method for boot­strap­ping tal­ent-con­strained re­search fields.

Method: Run an on­line re­search group cul­mi­nat­ing in a two week in­ten­sive in-per­son re­search camp. Par­ti­ci­pants will work in groups on tightly-defined re­search pro­jects on the fol­low­ing top­ics:

  • Agent foundations

  • Ma­chine learn­ing safety

  • Policy & strategy

  • Hu­man values

Pro­jects will be pro­posed by par­ti­ci­pants prior to the start of the pro­gram. Ex­pert ad­vi­sors from AI Safety/​Strat­egy or­gani­sa­tions will help re­fine them into pro­pos­als that are tractable, suit­able for this re­search en­vi­ron­ment, and an­swer cur­rently un­solved re­search ques­tions. This al­lows for time-effi­cient use of ad­vi­sors’ do­main knowl­edge and re­search ex­pe­rience, and en­sures that re­search is well-al­igned with cur­rent pri­ori­ties.

Par­ti­ci­pants will then split into groups to work on these re­search ques­tions in on­line col­lab­o­ra­tive groups over a pe­riod of sev­eral months. This pe­riod will cul­mi­nate in a two week in-per­son re­search camp aimed at turn­ing this ex­plo­ra­tory re­search into first drafts of pub­lish­able re­search pa­pers. This will also al­low for cross-dis­ci­plinary con­ver­sa­tions and com­mu­nity build­ing, al­though the goal is pri­mar­ily re­search out­put. Fol­low­ing the two week camp, ad­vi­sors will give feed­back on manuscripts, guid­ing first drafts to­wards com­ple­tion and ad­vis­ing on next steps for re­searchers.

Ex­am­ple: Mul­ti­ple par­ti­ci­pants sub­mit a re­search pro­posal or oth­er­wise ex­press an in­ter­est in in­ter­rupt­ibil­ity dur­ing the ap­pli­ca­tion pro­cess, and in work­ing on ma­chine learn­ing-based ap­proaches. Dur­ing the ini­tial idea gen­er­a­tion phase, these re­searchers read one an­other’s re­search pro­pos­als and de­cide to col­lab­o­rate based on their shared in­ter­ests. They de­cide to code up and test a va­ri­ety of novel ap­proaches on the rele­vant AI safety grid­world. Th­ese ap­proaches get for­mal­ised in a re­search plan.

This plan is cir­cu­lated among ad­vi­sors, who iden­tify the most promis­ing el­e­ments to pri­ori­tise and point out flaws that ren­der some pro­posed ap­proaches un­work­able. Par­ti­ci­pants feel en­couraged by ex­pert ad­vice and sup­port, and re­search be­gins on the im­proved re­search pro­posal.

Re­searchers be­gin for­mal­is­ing and cod­ing up these ap­proaches, shar­ing their work in a Github repos­i­tory that they can use as ev­i­dence of their en­g­ineer­ing abil­ity. It be­comes clear that a new grid­world is needed to in­ves­ti­gate is­sues aris­ing from re­search so far. After a brief con­ver­sa­tion, their ad­vi­sor is able to put them in touch with the rele­vant en­g­ineer at Deep­mind, who gives them some use­ful tips on cre­at­ing this.

At the re­search camp the par­ti­ci­pants are able to dis­cuss their find­ings and put them in con­text, as well as solve some tech­ni­cal is­sues that were im­pos­si­ble to re­solve part-time and re­motely. They write up their find­ings into a draft pa­per and pre­sent it at the end of the camp. The pa­per is read and com­mented on by ad­vi­sors, who give sug­ges­tions on how to im­prove the pa­per’s clar­ity. The pa­per is sub­mit­ted to NIPS 2018’s Aligned AI work­shop and is ac­cepted.

Ex­pected out­come: Each re­search group will aim to pro­duce re­sults that can form the ker­nel of a pa­per at the end of the July camp. We don’t ex­pect ev­ery group to achieve this, as re­search progress is hard to pre­dict.

  1. At the end of the camp, from five groups, we would ex­pect three to have ini­tial re­sults and a first draft of a pa­per that the ex­pert ad­vi­sors find promis­ing.

  2. Within six months fol­low­ing the camp, three or more draft pa­pers have been writ­ten that are con­sid­ered to be promis­ing by the re­search com­mu­nity.

  3. Within one year fol­low­ing the camp, three or more re­searchers who par­ti­ci­pated in the pro­ject ob­tain fund­ing or re­search roles in AI safety or strat­egy.

Next steps fol­low­ing the camp: When teams have pro­duced promis­ing re­sults, camp or­ga­niz­ers and ex­pert ad­vi­sors will en­deav­our to con­nect the teams to the right par­ties to help the re­search shape up fur­ther and be taken to con­clu­sion.

Pos­si­ble des­ti­na­tions for par­ti­ci­pants who wish to re­main in re­search af­ter the camp would likely be some com­bi­na­tion of:

  1. Full-time in­tern­ships in ar­eas of in­ter­est, for in­stance Deep­mind, FHI or CHAI

  2. Full-time re­search roles at AI safety/​strat­egy organisations

  3. Ob­tain­ing re­search fund­ing such as OpenPhil or FLI re­search grants—suc­cess­ful pub­li­ca­tions may un­lock new sources of funding

  4. In­de­pen­dent re­mote research

  5. Re­search en­g­ineer­ing roles at tech­ni­cal AI safety organisations

Re­search pro­jects can be tai­lored to­wards par­ti­ci­pants’ goals—for in­stance re­searchers who are in­ter­ested in en­g­ineer­ing or ma­chine learn­ing-re­lated ap­proaches to safety can struc­ture a pro­ject to in­clude a sig­nifi­cant cod­ing el­e­ment, lead­ing to (for in­stance) a GitHub repo that can be used as ev­i­dence of en­g­ineer­ing skill. This is also a rel­a­tively easy way for peo­ple who are un­sure if re­search work is for them to try it out with­out the large time in­vest­ment and op­por­tu­nity cost of a PhD or mas­ters pro­gram, al­though we do not see it as a full re­place­ment for these.


Timeline: We an­ti­ci­pate this pro­ject hav­ing 4 main phases (dates are cur­rently open for dis­cus­sion):

  1. Plan and de­velop the pro­ject, re­cruit re­searchers and look for ad­vi­sors—De­cem­ber 2017 to April 2018

  2. Test­ing and re­fine­ment of event de­sign dur­ing a small-scale camp at Gran Ca­naria—April 12-22

  3. Pro­ject se­lec­tion, re­fine­ment and ex­plo­ra­tion (on­line) - April 2018 to July 2018

  4. Re­search camp (in per­son) - July/​Au­gust 2018

Re­cruit­ing: We plan to have ap­prox­i­mately 20 re­searchers work­ing in teams of 3-5 peo­ple, with pro­jects in agent foun­da­tions, ma­chine learn­ing, strat­egy/​policy and hu­man val­ues/​cog­ni­tion. Based on re­sponses to a reg­is­tra­tion form we have already posted on­line (link here) we ex­pect to be able to eas­ily meet this num­ber of par­ti­ci­pants.

Each team will be ad­vised by a more ex­pe­rienced re­searcher in the rele­vant area, how­ever we ex­pect this won’t be as tightly-cou­pled a re­la­tion­ship as that be­tween PhD stu­dents and their su­per­vi­sors—the aim is to max­imise the use­ful­ness of the rel­a­tively scarce ad­vi­sor time and to de­velop as much in­de­pen­dence in re­searchers as pos­si­ble.

Pro­ject se­lec­tion and ex­plo­ra­tion: Once the ini­tial re­cruit­ment phase is com­plete, re­searchers and ad­vi­sors can choose a pro­ject to work on and re­fine it into a sin­gle ques­tion an­swer­able within the timeframe. We recog­nise the need for strong pro­ject plan­ning skills and care­ful pro­ject choice and re­fine­ment here, and this pro­ject choice is a po­ten­tial point of failure (see Im­por­tant Con­sid­er­a­tions be­low). Fol­low­ing pro­ject se­lec­tion, re­searchers will be­gin ex­plor­ing the re­search pro­ject they’ve cho­sen in the months be­tween pro­ject choice and the re­search camp. This would prob­a­bly re­quire five to ten hours a week of com­mit­ment from re­searchers, mostly asyn­chronously but with a weekly ‘scrum’ meet­ing to share progress within a pro­ject team. Reg­u­lar shar­ing of progress and for­ward plan­ning will be im­por­tant to keep mo­men­tum go­ing.

Re­search camp: Fol­low­ing the se­lec­tion and ex­plo­ra­tion, we will have a two-week in­ten­sive camp as­sem­bling all par­ti­ci­pants in-per­son at a re­treat to do fo­cused work on the re­search pro­jects. Ex­plo­ra­tory work can be done asyn­chronously, but finish­ing re­search pro­jects can be hard work and re­quire in­ten­sive com­mu­ni­ca­tion which can more eas­ily be done in per­son. This also makes the full-time el­e­ment of this pro­ject much more bounded and man­age­able for most po­ten­tial par­ti­ci­pants. An in-per­son meet­ing also al­lows for much bet­ter com­mu­ni­ca­tion be­tween re­searchers on differ­ent pro­jects, as well as helping form last­ing and fruit­ful con­nec­tions be­tween re­searchers.

Im­por­tant Considerations

Shap­ing the re­search ques­tion: Select­ing good re­search ques­tions for this pro­ject will be challeng­ing, and is one of the main po­ten­tial points of failure. The non-tra­di­tional struc­ture of the event brings with it some ex­tra con­sid­er­a­tions. We ex­pect that most pro­jects will be:

  1. Tractable to al­low progress to be made in a short pe­riod of time, rather than con­cep­tu­ally com­plex or open-ended

  2. Closely re­lated to cur­rent work, e.g. sug­ges­tions found in ‘fur­ther work’ or ‘open ques­tions’ sec­tions from re­cent papers

  3. Par­allelis­able across mul­ti­ple re­searchers, e.g. eval­u­at­ing mul­ti­ple pos­si­ble solu­tions to a sin­gle prob­lem or re­search­ing sep­a­rate as­pects of a policy proposal

This bi­ases pro­ject se­lec­tion to­wards in­cre­men­tal re­search, i.e. ex­tend­ing pre­vi­ous work rather than find­ing com­pletely new ap­proaches. This is hard to avoid in these cir­cum­stances, and we are op­ti­mis­ing at least partly for the cre­ation of new re­searchers who can go on to do more risky, less in­cre­men­tal re­search in the fu­ture. Fur­ther­more, a look at the ‘fu­ture work/​open ques­tions’ sec­tions of many pub­lished safety pa­pers will re­veal a broad se­lec­tion of in­ter­est­ing, use­ful ques­tions that still meet the crite­ria above so al­though this is a trade­off, we do not ex­pect it to be overly limit­ing. A good ex­am­ple of this in the Ma­chine Learn­ing sub­field would be eval­u­at­ing mul­ti­ple ap­proaches to one of the prob­lems listed in Deep­Mind’s re­cent AI Safety grid­wor­lds pa­per.

Find­ing ad­vi­sors: Although we in­tend this to be rel­a­tively self-con­tained, some amount of ad­vice from ac­tive re­searchers will be benefi­cial at both the pro­ject se­lec­tion and re­search stages, as well as at the end of the camp. The most use­ful pe­ri­ods for ad­vi­sor in­volve­ment will be at the ini­tial pro­ject se­lec­tion/​shap­ing phase and at the end of the camp—the former al­lows for bet­ter, more tractable pro­jects as well as con­vey­ing pre­vi­ously un­pub­lished rele­vant in­for­ma­tion and a sense of what’s con­sid­ered in­ter­est­ing. The lat­ter will be use­ful for prepar­ing pa­pers and in­te­grat­ing new re­searchers into the ex­ist­ing com­mu­nity. In­for­mal en­quiries sug­gest that it is likely to be pos­si­ble to re­cruit ad­vi­sors for these stages, but on­go­ing com­mit­ments will be more challeng­ing.

The ex­pected com­mit­ment dur­ing pro­ject se­lec­tion and shap­ing would be one or two ses­sions of sev­eral hours spent eval­u­at­ing and com­ment­ing on pro­posed re­search pro­jects. This could be done asyn­chronously or by video chat. Com­mit­ment at the end of the re­search camp is likely to be similar—re­spond­ing to ini­tial drafts of pa­pers with sug­ges­tions of im­prove­ments or fur­ther re­search in a similar way to the peer re­view pro­cess.

Costs: The main costs for the Gran Ca­naria camp, the AirBnBs, meals and low-in­come travel re­im­burse­ments, have been cov­ered now by two fun­ders. The July camp will likely take place in the UK at the EA Ho­tel, a co-work­ing hub planned by Greg Colbourn (for other op­tions, see here). For this, we will pub­lish a fund­ing pro­posal around April. Please see here for the draft bud­gets.

Long-term and wider impacts

If the camp proves to be suc­cess­ful, it could serve as the foun­da­tion for yearly re­cur­ring camps to keep boost­ing as­piring re­searchers into pro­duc­tivity. It could be­come a much-needed ad­di­tional lever to grow the fields of AI safety and AI strat­egy for many years to come. The re­search camp model could also be used to grow AI safety re­search com­mu­ni­ties where none presently ex­ist, but there is a strong need—in China, for in­stance. By us­ing ex­pe­rienced co­or­di­na­tors and ad­vi­sors in con­junc­tion with lo­cal vol­un­teers, it may be pos­si­ble to or­ganise a re­search camp with­out the need for pre-ex­ist­ing ex­perts in the com­mu­nity. A camp pro­vides a co­or­di­na­tion point for in­ter­ested par­ti­ci­pants, sig­nals sup­port for com­mu­nity build­ing, and if pre­vi­ous camps have been suc­cess­ful pro­vides so­cial proof for par­ti­ci­pants.

In ad­di­tion, scal­ing up re­search into rel­a­tively new cause ar­eas is a prob­lem that will need to be solved many times in the effec­tive al­tru­ist com­mu­nity. This could rep­re­sent an effi­cient way to ‘boot­strap’ a larger re­search com­mu­nity from a small pre-ex­ist­ing one, and so could be a use­ful ad­di­tion to the tool set available to the EA com­mu­nity.

This pro­ject serves as a nat­u­ral com­ple­ment to other AI safety pro­jects cur­rently in de­vel­op­ment such as RAISE that aim to teach re­searchers the foun­da­tional knowl­edge they will need to be­gin re­search. Once an as­piring AI safety re­searcher com­pletes one of these courses, they might con­sider a re­search camp as a nat­u­ral next step on the road to be­come a prac­tic­ing re­searcher.


Thanks to Ryan Carey, Chris Cundy, Vic­to­ria Krakovna and Matthijs Maas for read­ing and pro­vid­ing helpful com­ments on this doc­u­ment.


Tom McGrath

Tom is a maths PhD stu­dent in the Sys­tems and Sig­nals group at Im­pe­rial Col­lege, where he works on statis­ti­cal mod­els of an­i­mal be­havi­our and phys­i­cal mod­els of in­fer­ence. He will be in­tern­ing at the Fu­ture of Hu­man­ity In­sti­tute from Jan 2018, work­ing with Owain Evans. His pre­vi­ous or­gani­sa­tional ex­pe­rience in­cludes co-run­ning Im­pe­rial’s Maths Helpdesk and run­ning a post­grad­u­ate deep learn­ing study group.

Rem­melt Ellen


Rem­melt is the Oper­a­tions Man­ager of Effec­tive Altru­ism Nether­lands, where he co­or­di­nates na­tional events, works with or­ganisers of new mee­tups and takes care of mun­dane ad­min work. He also over­sees plan­ning for the team at RAISE, an on­line AI Safety course. He is a Bach­e­lor in­tern at the In­tel­li­gent & Au­tonomous Sys­tems re­search group.

In his spare time, he’s ex­plor­ing how to im­prove the in­ter­ac­tions within multi-lay­ered net­works of agents to reach shared goals – es­pe­cially ap­proaches to col­lab­o­ra­tion within the EA com­mu­nity and the rep­re­sen­ta­tion of per­sons and in­ter­est groups by ne­go­ti­a­tion agents in sub-ex­po­nen­tial take­off sce­nar­ios.

Linda Linsefors

Linda has a PhD in the­o­ret­i­cal physics, which she ob­tained at Univer­sité Greno­ble Alpes for work on loop quan­tum grav­ity. Since then she has stud­ied AI and AI Safety on­line for about a year. Linda is cur­rently work­ing at In­te­grated Science Lab in Umeå, Swe­den, de­vel­op­ing tools for analysing in­for­ma­tion flow in net­works. She hopes to be able to work full time on AI Safety in the near fu­ture.

Nandi Schoots

Nandi has a re­search mas­ter in pure math­e­mat­ics and a minor in psy­chol­ogy from Lei­den Univer­sity. Her mas­ter was fo­cused on alge­braic ge­om­e­try and her the­sis was in cat­e­gory the­ory. Since grad­u­at­ing she has been steer­ing her ca­reer in the di­rec­tion of AI safety. She is cur­rently em­ployed as a data sci­en­tist in the Nether­lands. In par­allel to her work she is part of a study group on AI safety and in­volved with the re­in­force­ment learn­ing sec­tion of RAISE.

David Kristoffersson

David has a back­ground as R&D Pro­ject Man­ager at Eric­s­son where he led a pro­ject of 30 ex­pe­rienced soft­ware en­g­ineers de­vel­op­ing many-core soft­ware de­vel­op­ment tools. He li­aised with five in­ter­nal stake­holder or­gani­sa­tions, worked out strat­egy, made high-level tech­ni­cal de­ci­sions and co­or­di­nated a dis­parate set of sub­pro­jects spread over seven cities on two differ­ent con­ti­nents. He has a fur­ther back­ground as a Soft­ware Eng­ineer and has a BS in Com­puter Eng­ineer­ing. In the past year, he has con­tracted for the Fu­ture of Hu­man­ity In­sti­tute, has ex­plored re­search pro­jects in ML and AI strat­egy with FHI re­searchers, and is cur­rently col­lab­o­rat­ing on ex­is­ten­tial risk strat­egy re­search with Con­ver­gence.

Chris Pasek

After grad­u­at­ing from math­e­mat­ics and the­o­ret­i­cal com­puter sci­ence, Chris ended up tour­ing the world in search of mean­ing and self-im­prove­ment, and fi­nally set­tled on work­ing as a free­lance re­searcher fo­cused on AI al­ign­ment. Cur­rently also run­ning a ra­tio­nal­ist shared hous­ing pro­ject on the trop­i­cal is­land of Gran Ca­naria and con­tin­u­ing to look for ways to grad­u­ally self-mod­ify in the di­rec­tion of a su­per­hu­man FDT-con­se­quen­tial­ist en­tity with a goal to save the world.

No nominations.
No reviews.