CFAR: Progress Report & Future Plans

Con­text: This is the first in a se­ries of year-end up­dates from CFAR. This post mainly de­scribes what CFAR did over the past two years, and what it plans to do next year; other posts in the se­ries will de­scribe more about its mis­sion, fi­nan­cials, and im­pact.

(12/​29/​19: Edited to men­tion CFAR’s mis­takes re­gard­ing Brent, and our plans to pub­li­cly re­lease our hand­book).

Progress in 2018 and 2019

For the last three years, CFAR held its to­tal num­ber of work­shop-days[1] roughly con­stant:

But the dis­tri­bu­tion of work­shops has changed con­sid­er­ably. Most no­tably, in 2018 we started co-run­ning AIRCS with MIRI, and since then we’ve been al­lo­cat­ing roughly ⅓ of our staff time[2] to this pro­ject.[3]


AI Risk for Com­puter Scien­tists (AIRCS) work­shops are de­signed to re­cruit and train pro­gram­mers for roles do­ing tech­ni­cal al­ign­ment re­search. The pro­gram name isn’t perfect: while most at­ten­dees are pro­gram­mers, it also ac­cepts math­e­mat­i­ci­ans, physi­cists, and oth­ers with sig­nifi­cant tech­ni­cal skill.

We co-run these work­shops with MIRI—the staff is mostly a mix of cur­rent and former CFAR and MIRI staff, and the cur­ricu­lum is roughly half ra­tio­nal­ity ma­te­rial and half ob­ject-level con­tent about the al­ign­ment prob­lem. So far, MIRI has hired six AIRCS grad­u­ates (more are in the pipeline). While these work­shops are de­signed pri­mar­ily to benefit MIRI’s re­search pro­grams, the cur­ricu­lum in­cludes a va­ri­ety of other per­spec­tives (e.g. Paul Chris­ti­ano’s IDA), and AIRCS grad­u­ates have also sub­se­quently ac­cepted po­si­tions at other al­ign­ment orgs (e.g. FHI, Ought, and the OpenAI al­ign­ment team).

We’re ex­cited about these work­shops. So far, 12 AIRCS grad­u­ates have sub­se­quently taken jobs or in­tern­ships do­ing tech­ni­cal safety re­search,[4] of which (from what we can tell), AIRCS played a non-triv­ial causal role[5] in 9 of these out­comes. AIRCS also seems to help par­ti­ci­pants learn to rea­son bet­ter about the al­ign­ment prob­lem, and to give MIRI greater fa­mil­iar­ity with can­di­dates (thus hope­fully im­prov­ing av­er­age qual­ity of hires).

Over­all, we’ve been im­pressed by the re­sults of this pro­gram so far. We’re also ex­cited about the cur­ricu­lar de­vel­op­ment that’s oc­curred at AIRCS, parts of which have been mak­ing their way into our cur­ricu­lum for other pro­grams.

Euro­pean Workshops

Another no­table strate­gic change over this pe­riod was that we de­cided to start run­ning many more work­shops in Europe. In to­tal, 23% of the work­shop days CFAR ran or co-ran in 2018 and 2019 oc­cured in Europe—these pro­grams in­cluded three main­lines, a men­tor­ship train­ing, one leg of our in­struc­tor train­ing, a work­shop at FHI fea­tur­ing mem­bers of their new Re­search Schol­ars Pro­gram, and ESPR 2018.[6]

Aside from ESPR and the FHI work­shop, all of these pro­grams were held in the Czech Repub­lic. This is partly be­cause we’re ex­cited about the Czech ra­tio­nal­ist/​EA scene, and partly be­cause the Czech As­so­ci­a­tion for Effec­tive Altru­ism does a fan­tas­tic job of helping us with ops sup­port, mak­ing it lo­gis­ti­cally eas­ier for us to run work­shops abroad.

In 2019, we spun off ESPR (the Euro­pean Sum­mer Pro­gram on Ra­tion­al­ity) into a new in­de­pen­dent or­ga­ni­za­tion run by Jan Kul­veit, Se­nior Re­search Scholar in FHI’s Re­search Schol­ars Pro­gram and one of the ini­tial or­ga­niz­ers of the Czech ra­tio­nal­ity/​EA com­mu­nity. We sent Eliz­a­beth Gar­rett to help with the tran­si­tion, but Jan or­ga­nized and ran the pro­gram in­de­pen­dently, and from what we can tell the pro­gram went splen­didly.

In­struc­tor Training

We ran our sec­ond-ever in­struc­tor train­ing pro­gram this year, led by Eli Tyre and Brienne Yud­kowsky. This pro­gram seems to me to have gone well. I think this is prob­a­bly partly be­cause the qual­ity of the co­hort was high, and partly be­cause Eli and Brienne’s cur­ricu­lum seemed to man­age (to a sur­pris­ing-to-all-of-us de­gree) to achieve its aim of helping peo­ple do gen­er­a­tive, origi­nal see­ing-type think­ing and re­search. Go­ing for­ward, we’ll be grate­ful and proud to have mem­bers of this co­hort as guest in­struc­tors.

In­ter­nal Improvements

CFAR’s in­ter­nal pro­cesses and cul­ture seem to me to have im­proved markedly over the past two years. We made a num­ber of ad­minis­tra­tive changes—e.g. we im­proved our sys­tems for ops, bud­gets, fi­nances, and in­ter­nal co­or­di­na­tion, and ac­quired a per­ma­nent venue, which re­duces the fi­nan­cial cost and the time cost of run­ning work­shops. And af­ter spend­ing sig­nifi­cant time in­ves­ti­gat­ing our mis­takes with re­gard to Brent, we re­formed our hiring, ad­mis­sions and con­duct poli­cies, to re­duce the like­li­hood such mis­takes re­oc­cur.

It also seems to me that our in­ter­nal cul­ture and morale im­proved over this time. Com­pared with two years ago, my im­pres­sion is that our staff trust and re­spect each other more—that there’s more ca­ma­raderie, and more sense of shared strate­gic ori­en­ta­tion.

Plans for 2020

From the out­side, I ex­pect CFAR’s strat­egy will look roughly the same next year as it has for the past two years. We plan to run a similar dis­tri­bu­tion of work­shops, to keep re­cruit­ing and train­ing tech­ni­cal al­ign­ment re­searchers (es­pe­cially for/​with MIRI, al­though we do try to help other orgs with re­cruit­ing too, where pos­si­ble), and to con­tinue prop­a­gat­ing epistemic norms in­tended to help peo­ple rea­son more sanely about ex­is­ten­tial risk—to de­velop the skills re­quired to no­tice, rather than ig­nore, cru­cial con­sid­er­a­tions; to avoid tak­ing rash ac­tions with large effect but un­clear sign; to model and de­scribe even sub­tle in­tu­itions with rigor.

So far, I think this strat­egy has proved im­pact­ful. I’m proud of what our team has ac­com­plished, both in terms of mea­surable im­pact on the AI safety land­scape (e.g. coun­ter­fac­tual ad­di­tional re­searchers), and in terms of more difficult to mea­sure effects on the sur­round­ing cul­ture (e.g. the prop­a­ga­tion of the term “crux”).

That said, I think CFAR still strug­gles with some pretty ba­sic stuff. For ex­am­ple:

  • His­tor­i­cally, I think CFAR has been re­ally quite bad at ex­plain­ing its goals, strat­egy, and mechanism of im­pact—not just to fun­ders, and to EA at large, but even to each other. I reg­u­larly en­counter peo­ple who, even af­ter ex­ten­sive in­ter­ac­tion with CFAR, have se­ri­ously mis­taken im­pres­sions about what CFAR is try­ing to achieve. I think this situ­a­tion is partly due to our im­pact model be­ing some­what un­usu­ally difficult to de­scribe (es­pe­cially the non-ca­reer-change-re­lated parts), but mostly due to us hav­ing done a poor job of com­mu­ni­cat­ing.[7]

  • Our staff burn out reg­u­larly. Work­ing at CFAR is in­tense: dur­ing work­shops, staff spend nearly all their wak­ing hours work­ing, and they typ­i­cally spend about ⅓ of their work­days at work­shops. Between work­shops and trav­el­ing, it can be hard to e.g. spend enough time with loved ones. And since many-per­son events are by na­ture, I think, some­what un­pre­dictable, our staff semi-reg­u­larly have to stay up late into the night fix­ing some sud­den prob­lem or an­other. Of the 13 long-term em­ploy­ees who have left CFAR to date, 5 re­ported leav­ing at least in sub­stan­tial part due to burnout. Even aside from con­cerns about burnout, our cur­rent work­shop load is suffi­ciently high that our staff have lit­tle time left to learn, im­prove, and be­come bet­ter at their jobs. We’re ex­plor­ing a va­ri­ety of strate­gies in­tended to miti­gate these prob­lems—for ex­am­ple, we’re con­sid­er­ing work­ing in three 3-month sprints next year, and col­lec­tively tak­ing one-month sab­bat­i­cals in be­tween each sprint to learn, work on per­sonal pro­jects and recharge.

  • We have limited abil­ity to track the effects of our work­shops. Some­times the effects are ob­vi­ous, like when peo­ple tell us they changed ca­reers be­cause of a work­shop; other times we have strong rea­son to sus­pect an effect ex­ists (e.g. many par­ti­ci­pants re­port­ing that some­how, the work­shop caused them to feel com­fortable se­ri­ously con­sid­er­ing work­ing on al­ign­ment), but we’re suffi­ciently un­clear about the de­tails of the mechanism of im­pact that it’s difficult to know how ex­actly to im­prove. And even in the clear­est cut cases, it’s hard to con­fi­dently es­ti­mate coun­ter­fac­tu­als—nei­ther us nor the par­ti­ci­pants, I think, have a great abil­ity to pre­dict what would have oc­curred in­stead in a world where they’d never en­coun­tered CFAR.

In 2020, we plan to de­vote sig­nifi­cant effort to­ward try­ing to miti­gate these and similar prob­lems. As­sum­ing we can find a suit­able can­di­date, one step we hope to take is hiring some­one to work alongside Dan Keys to help us gather and an­a­lyze more data. We’re also form­ing an out­side ad­vi­sory board—this board won’t have for­mal de­ci­sion-mak­ing power, but it will have a man­date to re­view (and pub­lish pub­lic re­ports on) our progress. This board is an ex­per­i­ment, which by de­fault will last one year (al­though we may dis­band it sooner if it turns out to cost too much time, or to in­cen­tivize us to good­hart, etc). But my guess is that both their ad­vice, and the ex­pec­ta­tion that we’ll reg­u­larly ex­plain our strat­egy and de­ci­sions to them, will cause CFAR to be­come more strate­gi­cally ori­ented—and hope­fully more leg­ibly so!—over the com­ing year.

(We’re also plan­ning to pub­li­cly re­lease our work­shop hand­book, which we’ve pre­vi­ously only given to alumni. Our reser­va­tions about dis­tribut­ing it more widely have faded over time, many peo­ple have asked us for a pub­lic ver­sion, and we’ve be­come in­creas­ingly in­ter­ested in get­ting more pub­lic crit­i­cism and en­gage­ment with our cur­ricu­lum. So we de­cided to run the ex­per­i­ment).

We’re hiring, and fundrais­ing!

From what I can tell, CFAR’s im­pact so far has been high. For ex­am­ple, of the 66 tech­ni­cal al­ign­ment re­searchers at MIRI, CHAI, FHI, and the Deep­Mind and OpenAI al­ign­ment teams[8], 38 have at­tended a work­shop of some sort, and work­shops seem to have had some non-triv­ial causal role in at least 15 of these ca­reers.

I also sus­pect CFAR has other forms of im­pact—for ex­am­ple, that our cur­ricu­lum helps peo­ple rea­son more effec­tively about ex­is­ten­tial risk, that AIRCS and MSFP give a kind of tech­ni­cal ed­u­ca­tion into al­ign­ment re­search that is both valuable and still rel­a­tively rare, and that the com­mu­nity-build­ing effects of our pro­grams prob­a­bly cause use­ful so­cial con­nec­tions to form that we wouldn’t gen­er­ally hear about. Our ev­i­dence on these points is less leg­ible, but per­son­ally, my guess would be that non-ca­reer-change-type im­pacts like these ac­count for some­thing like half of CFAR’s to­tal im­pact.

While our work­shops have, in ex­pec­ta­tion, helped cause ad­di­tional re­search hires to other tech­ni­cal safety orgs, and one might rea­son­ably be en­thu­si­as­tic about CFAR for rea­sons aside from the ca­reer changes it’s helped cause, I do think our im­pact has dis­pro­por­tionately benefited MIRI’s re­search pro­grams. So if you think MIRI’s work is low-value, I think it makes sense to de­crease your es­ti­mate of CFAR’s im­pact ac­cord­ingly. But es­pe­cially if MIRI re­search is one of the things you’d like to see con­tinue to grow, I think CFAR rep­re­sents one of the best available bets for turn­ing dol­lars (and staff time) into ex­is­ten­tial risk re­duc­tion.

CFAR could benefit from ad­di­tional money (and staff) to an un­usual de­gree this year. We are not in ur­gently dire straits in ei­ther re­gard—we have about 12 months of run­way, ac­count­ing for pend­ing grant pay­ments, and our cur­rent staff size is suffi­cient to con­tinue run­ning lots of work­shops. But our staff size is con­sid­er­ably smaller than it was in 2017 and 2018, and we ex­pect that we could fairly eas­ily turn ad­di­tional staff into ad­di­tional work­shop out­put. Also, our long-term sources of in­sti­tu­tional fund­ing are un­cer­tain, which is mak­ing it harder than usual for us to make long-term plans and com­mit­ments, and in­creases the por­tion of time we spend try­ing to raise funds rel­a­tive to run­ning/​im­prov­ing work­shops. So I ex­pect CFAR would benefit from ad­di­tional dol­lars and staff some­what more than in re­cent years. If you’d like to sup­port us, donate here—and if you’re in­ter­ested in ap­ply­ing for a job, reach out!

  1. This to­tal in­cludes both work­shops we co-run (like AIRCS and MSFP) and work­shops we run our­selves (like main­lines and men­tor­ship work­shops). ↩︎

  2. CFAR’s al­lo­ca­tion of staff time by goal roughly ap­prox­i­mates its dis­tri­bu­tion of work­shops by type, since most of its non-work­shop staff time is spent pur­su­ing goals in­stru­men­tal to run­ning work­shops (prepar­ing/​de­sign­ing classes, do­ing ad­mis­sions, etc). ↩︎

  3. To­gether with MSFP, this means about 50% of our work­shop out­put over the past two years has been fo­cused speci­fi­cally on re­cruit­ing and train­ing peo­ple for roles in tech­ni­cal safety re­search. ↩︎

  4. Of these, three even­tu­ally left these po­si­tions—two were in­tern­ships, one was a full-time job—and are not cur­rently em­ployed as al­ign­ment re­searchers, al­though we ex­pect some may take similar po­si­tions again in the fu­ture. We also ex­pect ad­di­tional ex­ist­ing AIRCS alumni may even­tu­ally find their way into al­ign­ment re­search; past pro­grams of this sort have of­ten had effects that were a cou­ple years de­layed. ↩︎

  5. More de­tails on these es­ti­mates com­ing soon in our 2019 Im­pact Data doc­u­ment. ↩︎

  6. The Czech As­so­ci­a­tion for Effec­tive Altru­ism also hosted a Euro­pean CFAR Alumni Re­u­nion in 2018, at­tended by 83 peo­ple. ↩︎

  7. I think our name prob­a­bly makes this prob­lem worse—that peo­ple some­times hear us as at­tempt­ing to claim some­thing like omni-do­main com­pe­tence, and/​or that our mis­sion is to make the pop­u­la­tion at large more ra­tio­nal in gen­eral. In fact, our cur­ricu­lar aims are much nar­rower: to prop­a­gate a par­tic­u­lar set of epistemic norms among a par­tic­u­lar set of peo­ple. We plan to ex­plore ways of re­solv­ing this con­fu­sion next year, in­clud­ing by chang­ing our name if nec­es­sary. ↩︎

  8. “Tech­ni­cal al­ign­ment re­searchers” is a fuzzy cat­e­gory, and ob­vi­ously in­cludes peo­ple out­side these five or­ga­ni­za­tions; this group just struck me as a fairly nat­u­ral, clearly-delineated cat­e­gory. ↩︎