Oxford Prioritisation Project Review

By Ja­cob Lager­ros and Tom Sit­tler.

Cross-posted to Tom’s web­site.

Short summary

The Oxford Pri­ori­ti­sa­tion Pro­ject was a re­search group be­tween Jan­uary and May 2017. The team con­ducted re­search to al­lo­cate £10,000 in the most im­pact­ful way, and pub­lished all our work on our blog. Tom Sit­tler was the Pro­ject’s di­rec­tor and Ja­cob Lager­ros was its sec­re­tary, closely sup­port­ing Tom. This doc­u­ment is our (Ja­cob and Tom’s) in-depth re­view and im­pact eval­u­a­tion of the Pro­ject. Our main con­clu­sions are that the Pro­ject was an ex­cit­ing and am­bi­tious ex­per­i­ment with a new form of EA vol­un­teer work. Although its im­pact fell short of our ex­pec­ta­tions in many ar­eas, we learned an enor­mous amount and pro­duced use­ful quan­ti­ta­tive mod­els.


  1. Short summary

  2. Contents

  3. Ex­ec­u­tive summary

  4. The im­pact of the Pro­ject

    1. What were the goals? How did we perform on them?

      1. 1: Pub­lish on­line doc­u­ments de­tailing con­crete pri­ori­ti­sa­tion reasoning

      2. 2: Pri­ori­ti­sa­tion researchers

      3. 3: Train­ing for earn-to-givers

      4. 4: Give lo­cal groups some­thing to do

      5. 5: Lo­cal group epistemics

      6. 6. The value of in­for­ma­tion of the Oxford Pri­ori­ti­sa­tion Project

    2. Learn­ing value for Ja­cob and Tom

    3. What were the costs of the pro­ject?

      1. Stu­dent time

      2. CEA money and time

  5. Main challenges

    1. Differ­ent lev­els of pre­vi­ous ex­pe­rience

      1. Heteroge­nous start­ing points

      2. Con­tinued heterogeneity

    2. Team breakdown

  6. Should there be more similar pro­jects? Les­sons for repli­ca­tion

    1. Did the Pro­ject achieve pos­i­tive im­pact?

      1. Costs and benefits

      2. Tom’s feelings

    2. Things we would ad­vise chang­ing if the pro­ject were repli­cated

      1. Less ambition

      2. Shorter duration

      3. Use a smaller grant if it seems easier

      4. Fo­cus on quan­ti­ta­tive mod­els from the beginning

      5. More ho­moge­nous team

      6. Smaller team

  7. More gen­eral up­dates about epistemics, teams, and com­mu­nity

    1. The epistemic at­mo­sphere of a group will be more truth-seek­ing when a large dona­tion is con­di­tional on its perfor­mance.

    2. A ma­jor risk to the pro­ject is peo­ple hold on too strongly to their pre-Pro­ject views

    3. A large ma­jor­ity of team ap­pli­cants would be peo­ple we know per­son­ally.

Ex­ec­u­tive summary

A num­ber of paths for im­pact mo­ti­vated this pro­ject, fal­ling roughly into two cat­e­gories: pro­duc­ing valuable re­search (both to in­form and to in­spire) and em­pow­er­ing peo­ple (by mak­ing them more knowl­edge­able, by im­prov­ing the lo­cal com­mu­nity…).

We feel that the Pro­ject’s im­pact fell short of our ex­pec­ta­tions in many ar­eas, es­pe­cially in em­pow­er­ing peo­ple but also in pro­duc­ing re­search. Yet we are proud of the Pro­ject, which was an ex­cit­ing and am­bi­tious ex­per­i­ment with a new form of EA vol­un­teer work. By launch­ing into this un­ex­plored space, we have pro­vided sig­nifi­cant value of in­for­ma­tion for our­selves and the EA com­mu­nity.

We be­lieve that we in­creased the pri­ori­ti­sa­tion skill of team mem­bers only to a small ex­tent (and con­cen­trated on one or two peo­ple), much less than we hoped. We en­coun­tered se­vere challenges with a het­ero­ge­neous team, and an even­tual team break­down that threat­ened the ex­is­tence of the Pro­ject.

On the other hand, we feel con­fi­dent that we learned an enor­mous amount through the Pro­ject, in­clud­ing some things we couldn’t have learned any other way. This goes from team man­age­ment un­der strong time pres­sure and lead­er­ship in the face of un­cer­tainty, to group epistemics and quan­ti­ta­tive-model-build­ing skills.

Re­search-wise, we are happy with our quan­ti­ta­tive mod­els, which we see as a mod­er­ately use­ful con­tri­bu­tion. We are less ex­cited about the rest of our out­put, which con­sumed a lot of time yet feels less rele­vant.

We’d like to thank ev­ery­one on the team for mak­ing the Pro­ject pos­si­ble, as well as Owen Cot­ton-Bar­ratt and Max Dal­ton for their valuable sup­port.

The im­pact of the Project

What were the goals? How did we perform on them?

In a doc­u­ment I wrote in Jan­uary 2017, be­fore the pro­ject started, I iden­ti­fied the fol­low­ing goals for the pro­ject:

  1. Pub­lish on­line doc­u­ments de­tailing con­crete pri­ori­ti­sa­tion rea­son­ing
    This has di­rect benefits for peo­ple who would learn from read­ing it, and in­di­rect benefits by en­courag­ing oth­ers to pub­lish their rea­son­ing too. Sur­pris­ingly few peo­ple in the EA com­mu­nity cur­rently write blog posts ex­plain­ing their dona­tion de­ci­sions in de­tail.

  2. Pro­duce pri­ori­ti­sa­tion re­searchers
    Out­stand­ing par­ti­ci­pants of the Oxford Pri­ori­ti­sa­tion Pro­ject may be made more likely to be­come fu­ture CEA, OpenPhil, or GiveWell hires.

  3. Train­ing for earn-to-givers
    It’s not re­ally use­ful for the av­er­age mem­ber of a lo­cal group to be­come an ex­pert on dona­tion de­ci­sions. Most peo­ple should prob­a­bly defer to a char­ity eval­u­a­tor. How­ever, for peo­ple who earn to give and donate larger sums, it’s of­ten worth spend­ing more time on the de­ci­sion. So the Oxford Pri­ori­ti­sa­tion Pro­ject could be ideal train­ing for peo­ple who are con­sid­er­ing earn­ing to give in the fu­ture.

  4. Give lo­cal groups some­thing to do (see also Scott Alexan­der on “push­ing vs pul­ling goals”) Altru­is­tic so­cieties or groups may of­ten vol­un­teer, or­ganise protests, write a policy pa­per, fundraise, etc., even if the im­pact on the world is ac­tu­ally neg­ligible. Th­ese so­cieties might do these things just to gives their mem­bers some­thing to do, cre­ate a group they can feel part of, and give the so­ciety lead­ers sta­tus. But within the effec­tive al­tru­ism move­ment, many of these low-im­pact ac­tivi­ties would ap­pear hyp­o­crit­i­cal. Peo­ple in move­ment build­ing have been think­ing about this prob­lem. The Cen­tre for Effec­tive Altru­ism and other or­gani­sa­tions have full-time staff work­ing on lo­cal group out­reach, but they have not to my knowl­edge pro­posed new “things to ac­tu­ally do”. The Pro­ject is a thing to do that is not out­reach.

  5. Heighten the in­tel­lec­tual level of lo­cal groups
    *Cur­rently most of the EA com­mu­nity is in­tel­lec­tu­ally pas­sive. Many of us have a su­perfi­cial un­der­stand­ing of pri­ori­ti­sa­tion, we mostly use heuris­tics and ar­gu­ments from au­thor­ity. By hav­ing more peo­ple in the com­mu­nity who ac­tu­ally do pri­ori­ti­sa­tion (e.g. who ac­tu­ally un­der­stand GiveWell’s spread­sheets), we in­crease the qual­ity of the av­er­age con­ver­sa­tion. *

In ad­di­tion to these give ob­ject-level goals, a sixth goal:

  1. The value of in­for­ma­tion of the Oxford Pri­ori­ti­sa­tion Pro­ject
    Much of the ex­pected im­pact of the Pro­ject comes from dis­cov­er­ing whether this kind of pro­ject pro­ject can work, and whether it can be repli­cated in lo­cal groups around the world in or­der to get the ob­ject-level im­pacts many times over

1: Pub­lish on­line doc­u­ments de­tailing con­crete pri­ori­ti­sa­tion reasoning

Quan­tity-wise, this goal was achieved. We pub­lished 38 blog posts, in­clud­ing in­di­vi­d­u­als de­scribing their cur­rent views, minor tech­ni­cal con­tri­bu­tions to bayesian prob­a­bil­ity the­ory, dis­cus­sion tran­scripts and, most im­por­tantly, quan­ti­ta­tive mod­els.

How­ever, the ex­tent to which our con­tent en­gaged with sub­stan­tial pri­ori­ti­sa­tion ques­tions, and was in­tel­lec­tu­ally use­ful to the wider EA com­mu­nity, was far less than we ex­pected. Over­all, we feel that our sub­stan­tial in­tel­lec­tual con­tri­bu­tion were our quan­ti­ta­tive mod­els. Yet these were ex­tremely spec­u­la­tive and de­vel­oped in the last few weeks of the Pro­ject, while most of the pre­ced­ing work was far less use­ful.

Re­gard­ing “di­rect benefits for peo­ple who would learn from read­ing” our re­search: this is very difficult to eval­u­ate, but our ten­ta­tive feel­ing was that this was lower than we ex­pected. We re­ceived less di­rect en­gage­ment with our re­search on the EA fo­rum than we ex­pected, and we be­lieve few peo­ple read our mod­els. Indi­rectly, the mod­els were refer­enced in some newslet­ters (for ex­am­ple MIRI’s). How­ever, since our writ­ings will re­main on­line, there may be a small but long-last­ing trickle of benefits into the fu­ture, from peo­ple com­ing across our mod­els.

Though we did not ex­pect to break ma­jor new con­cep­tual ground in pri­ori­ti­sa­tion re­search, we be­lieved that the EA com­mu­nity pro­vides too many ‘con­sid­er­a­tions’-type and too few ‘weigh­ing’-type1 ar­gu­ments. Mak­ing an ac­tual grant­ing de­ci­sion would hope­fully force us to gen­er­ate ‘weigh­ing’-type ar­gu­ments, and this was a ma­jor im­pe­tus for start­ing the Pro­ject. So, we rea­soned, even though we might not go be­yond the fron­tier of pri­ori­ti­sa­tion re­search, we could nonethe­less be use­ful to peo­ple with the most ad­vanced EA knowl­edge, by pro­duc­ing work that helps ag­gre­gate ex­ist­ing re­search into an ac­tion­able rank­ing. We think we were mod­er­ately suc­cess­ful in this re­spect, thanks to our quan­ti­ta­tive mod­els.

2: Pri­ori­ti­sa­tion researchers

This is tech­ni­cally too early to eval­u­ate, but we are pes­simistic about it: we do not think the pro­ject caused any mem­ber who oth­er­wise would not have con­sid­ered it to now con­sider pri­ori­ti­sa­tion re­search as a ca­reer2. This is based on im­pres­sions of, and con­ver­sa­tions with, mem­bers.

This goal was a ma­jor fac­tor in our de­ci­sions of which ap­pli­cants to ad­mit to the pro­ject. We se­lected sev­eral peo­ple who had less ex­pe­rience with EA top­ics, but who were in­ter­ested and tal­ented, in or­der to in­crease our chance of achiev­ing this sub-goal. In ret­ro­spect, this was clearly a mis­take, since get­ting these peo­ple up to speed proved far more difficult than we ex­pected, and we still don’t think we had a coun­ter­fac­tual im­pact on their ca­reers. Look­ing back, we recog­nise that there was some ev­i­dence for this that we in­ter­preted in­cor­rectly at the time, so we made a mis­take in ex­pec­ta­tion, but not an ob­vi­ous one.

As of­ten, we sus­pect the im­pact in this cat­e­gory was ex­tremely skewed across in­di­vi­d­u­als. While we think we had no im­pact on most mem­bers, we think there is a small (<5%)3 chance that we have coun­ter­fac­tu­ally changed the in­ter­ests and abil­ities of one team mem­ber, such that this per­son will in the fu­ture work in global pri­ori­ties re­search.

3: Train­ing for earn-to-givers

This was not achieved, for two rea­sons. While at the out­set, we be­lieved that there were about 3 team mem­bers who might con­sider earn­ing to give in the fu­ture, by the end we think only one of them has a >50% chance of choos­ing that ca­reer path. So even though we pro­vided an op­por­tu­nity to prac­tice pri­ori­ti­sa­tion think­ing, and es­pe­cially quan­ti­ta­tive mod­el­ling, we don’t think we had an im­pact by im­prov­ing the de­ci­sions of fu­ture earn-to-givers. Re­gard­less, we be­lieve that this prac­tice failed to in­crease the pri­ori­ti­sa­tion skill of our team (see pre­vi­ous sec­tions), so we wouldn’t have had im­pact here any­way.

4: Give lo­cal groups some­thing to do

This goal was achieved. We de­signed and im­ple­mented a new form of ob­ject-level group en­gage­ment that could the­o­ret­i­cally be repli­cated in other lo­ca­tions. How­ever, it’s de­bat­able whether the cost:benefit ra­tio of such repli­ca­tions is suffi­ciently high. See the sec­tion: “Should there be more similar pro­jects? Les­sons for repli­ca­tion”

5: Lo­cal group epistemics

This goal was not achieved.

One im­pulse for start­ing the pro­ject was a frus­tra­tion about the lack of in-depth, ob­ject-level in­tel­lec­tual ac­tivity in the lo­cal stu­dent EA com­mu­nity, which we (Ja­cob and Tom) are both part of. Cur­rent ac­tivi­ties look like:

  • At­tend­ing and or­ganis­ing in­tro­duc­tory events

  • So­cial events, where con­ver­sa­tions fo­cus on:

    • Dis­cussing new de­vel­op­ments in EA organisations

    • Philos­o­phy, es­pe­cially ethics

    • ‘Con­sid­er­a­tions’-type ar­gu­ments, with a spe­cial fo­cus on con­tro­ver­sial, or ex­treme ar­gu­ments. Much rep­e­ti­tion of well-known ar­gu­ments.

  • Fundraising

We wanted to see more of:

  • Dis­cus­sion of ‘weigh­ing’-type ar­gu­ments, with a fo­cus on spe­cific, quan­tifi­able claims

  • In­stead of rep­e­ti­tion of known con­sid­er­a­tions, dis­cus­sion of in­di­vi­d­ual peo­ple’s ac­tual be­liefs on core EA ques­tions, and what would change their minds. Con­ver­sa­tions at the fron­tier of peo­ple’s knowl­edge.

  • Know­ing when peo­ple change their minds

  • In­di­vi­d­u­als con­duct­ing shal­low (3-20 hour), em­piri­cal or the­o­ret­i­cal re­search pro­jects, and pub­lish­ing them online

We did not be­lieve that the Pro­ject alone could have achieved any of these changes. But we were op­ti­mistic that it would help push in that di­rec­tion. We thought mem­bers of the lo­cal group would be­come ex­cited about the Pro­ject, dis­cuss its tech­ni­cal de­tails, and give feed­back. We also thought that team mem­bers would so­cial­ise with lo­cal group mem­bers and dis­cuss their work, and hoped that the Pro­ject would serve as a model in­spiring other more in­tel­lec­tu­ally fo­cused ac­tivi­ties. None of these hap­pened.

The lo­cal com­mu­nity was largely in­differ­ent to the Pro­ject, as ev­i­denced by an at­ten­dance of no more than 10 peo­ple at our fi­nal de­ci­sion an­nounce­ment. Through­out the Pro­ject, there was lit­tle in­ter­ac­tion be­tween the com­mu­nity and team mem­bers. In ret­ro­spect, we think we could have done more to fa­cil­i­tate and en­courage such in­ter­ac­tion. But we were already very busy as things were, so this would have needed to trade off against an­other of our ac­tivi­ties.

Over­all we clearly didn’t achieve this goal.

6. The value of in­for­ma­tion of the Oxford Pri­ori­ti­sa­tion Project

This goal was ar­guably achieved, in the sense that the Pro­ject pro­duced sev­eral un­ex­pected re­sults which carry im­por­tant im­pli­ca­tions for fu­ture pro­jects. The in­for­ma­tion gained in­cluded:

Learn­ing value for Ja­cob and Tom

The Pro­ject was a huge learn­ing ex­pe­rience for us both, and es­pe­cially strongly for Tom. This was the first time Tom led a team. Run­ning a group of pri­ori­ti­sa­tion re­searchers was a very differ­ent task from aca­demic pro­jects or in­tern­ships we had been in­volved with in the past.

Our guess is that be­tween 25% and 75% of the value cre­ated by the Pro­ject was through our be­com­ing wiser and more ex­pe­rienced. This ad­mit­tedly sub­jec­tive con­clu­sion re­lies on a num­ber of difficult-to-ver­bal­ise in­tu­itions, to the effect that we came out of the pro­ject know­ing more about our own strengths and weak­nesses, and how peo­ple and groups work. Since we both plan to give a sub­stan­tial weight to al­tru­is­tic con­sid­er­a­tions in our ca­reer de­ci­sions, this could be im­pact­ful.

Through­out the Pro­ject, Tom kept a jour­nal of spe­cific learn­ing points, mostly for his own benefit but also for oth­ers who would po­ten­tially be in­ter­ested in repli­cat­ing the Pro­ject. He origi­nally planned to turn these notes into a well-struc­tured and de­tailed ret­ro­spec­tive, but com­plet­ing this work now looks as though it would not be worth the time cost. In­stead he is pub­lish­ing his notes with min­i­mal edit­ing here. Th­ese files re­flect Tom’s views at the time of writ­ing (in­di­cated on each doc­u­ment); he may not en­dorse them in full any­more. They cover the fol­low­ing top­ics, in alpha­bet­i­cal or­der:

What were the costs of the pro­ject?

Stu­dent time

Tom tracked 308 fo­cused po­modoros (~ 150 hours) on this pro­ject, and es­ti­mates that the true num­ber of fo­cused hours was closer to 500. Tom also es­ti­mates he ded­i­cated at least an­other 200 hours of less fo­cused time to the Pro­ject.

Ja­cob es­ti­mates he spent 100 hours on the Pro­ject.

CEA money and time

We would guess that the real costs of the £10,000 grant were low. At the out­set, the prob­a­bil­ity was quite high that the money would even­tu­ally be granted to a high-im­pact or­gani­sa­tion, with a cost-effec­tive­ness not sev­eral times smaller than CEA’s coun­ter­fac­tual use of the money3. In fact, the grant was given to 80,000 Hours.

The costs of snacks and drinks for our meet­ings, and lo­gis­tics for the fi­nal event were about £500, cov­ered by CEA’s lo­cal group bud­get.

We very ten­ta­tively es­ti­mate that Owen Cot­ton-Bar­ratt spent less than 5 hours, and Max Dal­ton about 15 hours, helping us with the Pro­ject over the six months in which it ran. We are very grate­ful to both for their valuable sup­port.

Main challenges

We faced a num­ber of challenges; we’ll de­scribe only the biggest ones, tak­ing them in rough chronolog­i­cal or­der.

Differ­ent lev­els of pre­vi­ous experience

Heteroge­nous start­ing points

Some team mem­bers were ex­pe­rienced with ad­vanced EA top­ics, while oth­ers were be­gin­ners with an in­ter­est in cost-effec­tive char­ity. This was in part be­cause we ex­plic­itly aimed to in­clude some less ex­pe­rienced team mem­bers at the re­cruit­ment stage (see above, “2: Pri­ori­ti­sa­tion re­searchers”). But an equally im­por­tant fac­tor was that, be­fore we met them in per­son, we over­es­ti­mated some team mem­bers’ un­der­stand­ing of pri­ori­ti­sa­tion re­search.

We se­lected the team ex­clu­sively with an on­line ap­pli­ca­tion form. Once the pro­ject started, and we be­gan talk­ing to them in per­son, it quickly be­came clear that we had over­es­ti­mated many team mem­bers’ fa­mil­iar­ity with the ba­sic ar­gu­ments, con­cepts, and stylised facts that con­sti­tute the ground­work of pri­ori­ti­sa­tion work. Pos­si­ble ex­pla­na­tions for our mis­take in­clude:

  • Typ­i­cal mind fal­lacy, or in­suffi­cient em­pa­thy with ap­pli­cants. Be­cause we knew much more, we un­con­sciously filled gaps in peo­ple’s ap­pli­ca­tions. For ex­am­ple, if some­one was vaguely ges­tur­ing at a con­cept, we would im­me­di­ately un­der­stand not only the ar­gu­ment they were think­ing of, but also many vari­a­tions and nu­ances of this ar­gu­ment. This could turn into be­liev­ing that the ap­pli­cant had made the nu­anced ar­gu­ment.

  • Wish­ful think­ing. We were ex­cited by the idea of build­ing a knowl­edge­able team, so we may have been mo­ti­vated to ig­nore coun­ter­vailing ev­i­dence.

  • Un­der­es­ti­mat­ing ap­pli­cant’s de­sire and abil­ity to show them­selves in the best light. We ne­glected to ac­count for the fact that ap­pli­cants could care­fully craft their text to em­pha­sise their strengths, and mis­tak­enly treated their ap­pli­ca­tions more as if they were tran­scripts of an in­for­mal con­ver­sa­tion.

  • In­suffi­ciently dis­crim­i­na­tive ap­pli­ca­tion ques­tions. Tom put sig­nifi­cant effort into de­sign­ing a short but in­for­ma­tive ap­pli­ca­tion. Ap­pli­cants were asked to provide a CV, a Fermi es­ti­mate of the to­tal length of wa­ter­slides in the US, and a par­tic­u­lar re­search ques­tion they ex­pected to en­counter dur­ing the pro­ject, along with their ap­proach for an­swer­ing it. After the fact, we not only think that these spe­cific ques­tions were sub­op­ti­mal4, but also see clear ways the ap­pli­ca­tion pro­cess as a whole could have been done very differ­ently and much bet­ter (see sec­tion “Smaller team” be­low). We strug­gle to think of ev­i­dence for this that we in­ter­preted in­cor­rectly at the time, so this may still have been the cor­rect de­ci­sion in ex­pec­ta­tion.

Con­tinued heterogeneity

A het­eroge­nous team alone would not have been a ma­jor prob­lem if we hadn’t also dra­mat­i­cally over­es­ti­mated our abil­ity to bring the less ex­pe­rienced mem­bers up to speed.

We had planned to spend sev­eral weeks at the be­gin­ning of the pro­ject work­ing es­pe­cially proac­tively with these team mem­bers to fill re­main­ing gaps in their knowl­edge. We pre­pared a list of “pri­ori­ti­sa­tion re­search con­cepts” and held a ro­tat­ing se­ries of short pre­sen­ta­tions on them and we gave spe­cific team mem­bers rele­vant read­ing ma­te­rial. We ex­pected that team mem­bers would learn quickly from each other and “learn by do­ing”, from try­ing their hand at pri­ori­ti­sa­tion.

In fact, we made barely any progress. For all ex­cept one team mem­ber, we feel that we failed to bring them sub­stan­tially closer to be­ing able to mean­ingfully con­tribute to pri­ori­ti­sa­tion re­search: ev­ery­one re­mained largely at their pre­vi­ous lev­els, some high, some low5.

This makes us sub­stan­tially more pes­simistic about the pos­si­bil­ity of fos­ter­ing EA re­search tal­ent through proac­tive schemes rather than let­ting in­di­vi­d­u­als learn or­gan­i­cally. (EA Berkeley seemed more pos­i­tive about their stu­dent-led EA class, call­ing it “very suc­cess­ful”, but we be­lieve it was many times less am­bi­tious). We feel more con­fi­dent that there is a ba­sic global pri­ori­ti­sa­tion mind­set, which is ex­tremely rare and difficult to change by cer­tain kinds of out­side in­ter­ven­tion, but es­sen­tial for EA re­searchers.

Team breakdown

We were strug­gling to cre­ate a co­he­sive team where ev­ery­one was able to con­tribute to the shared goal of op­ti­mally al­lo­cat­ing the £10,000, and was mo­ti­vated to do so. Mean­while, some team mem­bers be­came less en­gaged, per­haps as a re­sult of the lack of visi­ble signs of progress. Meet­ing at­ten­dance be­gan to de­cline, and the prob­lem wors­ened un­til the end of the Pro­ject, at which point four out of nine team mem­bers had dropped out. After the pro­ject only 3 out of 7 team mem­bers took the post-pro­ject sur­vey. The re­sults have in­formed our es­ti­mates through­out this eval­u­a­tion.

While un­der­stand­ing that the dropout rate for vol­un­teer pro­jects is typ­i­cally high, we still per­ceived this as a frus­trat­ing failure. An un­ex­pected num­ber of team mem­bers en­coun­tered health or fam­ily prob­lems, while oth­ers sim­ply lost mo­ti­va­tion. Start­ing around halfway through the Pro­ject, the ma­jor­ity our efforts were fo­cused on avert­ing a com­plete dis­solu­tion of the team, which would have ended the Pro­ject.

As a re­sult, we de­cided to severely cur­tail the am­bi­tion of the Pro­ject by choos­ing our four short­listed char­i­ties our­selves, with­out team in­put, and ac­cord­ing to differ­ent crite­ria than those we had origi­nally en­vi­sioned6. We had been plan­ning to short­list the or­gani­sa­tions with the high­est ex­pected im­pact, as a team, in a prin­ci­pled way. In­stead we (Ja­cob and Tom) took into ac­count our hunches about ex­pected im­pact as well as the in­tel­lec­tual value of pro­duc­ing a quan­ti­ta­tive model of a par­tic­u­lar or­gani­sa­tion, in or­der to ar­rive at a highly sub­jec­tive and un­der-jus­tified judge­ment call.

We are satis­fied with this de­ci­sion; we be­lieve that it al­lowed us to cre­ate most of the value that could still be cap­tured at that stage, given the cir­cum­stances. With a smaller team and a more fo­cused goal, we pro­duced the four quan­ti­ta­tive mod­els which led to our fi­nal de­ci­sion.

Should there be more similar pro­jects? Les­sons for replication

Did the Pro­ject achieve pos­i­tive im­pact?

Costs and benefits

See above, “What were the costs of the pro­ject?”.

Tom’s feelings

It’s im­por­tant to make a dis­tinc­tion be­tween the im­pacts of the Pro­ject from a purely im­par­tial per­spec­tive, and the im­pacts ac­cord­ing to my val­ues, which give a much larger place to me and my friends’ well-be­ing.

Given that the ob­ject-level im­pacts (see above, “The im­pact of the Pro­ject”) were, in my view, low, effects on Ja­cob’s and my per­sonal tra­jec­to­ries (our aca­demic perfor­mance, well-be­ing, skill-build­ing) could be im­por­tant, even from an im­par­tial point of view.

Against a coun­ter­fac­tual of “no Oxford Pri­ori­ti­sa­tion Pro­ject” (say, if the idea had not been sug­gested to me, or if we had not re­ceived fund­ing), I would guess with low con­fi­dence that the Pro­ject had nega­tive (im­par­tial) im­pact. Without the Pro­ject, I would have spent these 6+ months hap­pier and less stressed, with more time to spend on my stud­ies. I plan to give sig­nifi­cant weight to al­tru­is­tic con­sid­er­a­tions in my ca­reer de­ci­sions, so this alone could have made the pro­ject net-nega­tive. In ad­di­tion, I be­lieve I would have spent sig­nifi­cant time think­ing about ob­ject-level pri­ori­ti­sa­tion ques­tions on my own, and pub­lished my thoughts in some form. On the other hand, I learned a lot about team man­age­ment and my own strengths and weak­nesses through the Pro­ject. All things con­sid­ered, I sus­pect that the Pro­ject was a lit­tle bit less good than this coun­ter­fac­tual.

When it comes to my own val­ues, I’m slightly more con­fi­dent that the Pro­ject was nega­tive against this coun­ter­fac­tual. If offered to go back in time to re-ex­pe­rience the same events, I would prob­a­bly de­cline.

Both im­par­tially and per­son­ally speak­ing, there are some nearby coun­ter­fac­tu­als against which I am slightly more con­fi­dent that the Pro­ject was nega­tive. Th­ese mostly take the form of de­vel­op­ing quan­ti­ta­tive mod­els with two or three close friends, in an in­for­mal set­ting, and iter­at­ing on them rapidly, with or with­out money to grant. How­ever, these coun­ter­fac­tu­als are un­likely; at the time I didn’t have the in­for­ma­tion to re­al­ise how good they would be.

Go­ing back now to the im­par­tial per­spec­tive: de­spite my weakly held view de­scribed above, there are sev­eral sce­nar­ios for pos­i­tive im­pact from the Pro­ject which I find quite plau­si­ble. For ex­am­ple, I would con­sider the Pro­ject to have paid for it­self rel­a­tive to rea­son­able coun­ter­fac­tu­als if:

  • what I learned from the Pro­ject helps me im­prove a ma­jor ca­reer decision

  • the team mem­ber men­tioned above ends up pur­su­ing global pri­ori­ties research

  • we in­spire an­other group to launch a pro­ject in­spired by our model, and they achieve rad­i­cally bet­ter outcomes

Things we would ad­vise chang­ing if the pro­ject were replicated

Less ambition

Global pri­ori­ti­sa­tion is very challeng­ing for two rea­sons. First, the search space con­tains a large num­ber of pos­si­ble in­ter­ven­tions and or­gani­sa­tions. Se­cond, the search space spans mul­ti­ple very differ­ent fo­cus ar­eas, such as global health and ex­is­ten­tial risk re­duc­tion.

The Pro­ject aimed to tackle both of these challenges. This high level of am­bi­tion was a con­scious de­ci­sion; we were ex­cited by the lack of ar­tifi­cial re­stric­tions on the search space. Though we had no un­re­al­is­tic hopes of find­ing the truly best in­ter­ven­tion, or of break­ing sig­nifi­cant new ground in pri­ori­ti­sa­tion re­search, we still felt that an un­re­stricted search space would make the task more valuable, it made it feel more real, and less like a stu­dent’s ex­er­cise. We im­plic­itly pre­dicted that other team mem­bers would also be more mo­ti­vated by the am­bi­tious na­ture of the Pro­ject, but this turned out not to be the case. If any­thing, mo­ti­va­tion in­creased af­ter we shifted to less am­bi­tious goals.

Given that our ini­tial goal proved too difficult, even given the tal­ent pool available in Oxford, we would recom­mend that po­ten­tial repli­ca­tions re­strict the search space to elimi­nate one of the two challenges. This gives two op­tions:

  • Pri­ori­ti­sa­tion among a pre-es­tab­lished short­list of or­gani­sa­tions work­ing in differ­ent fo­cus ar­eas. (This is the op­tion we chose to­wards the end of the Pro­ject).

  • Pri­ori­ti­sa­tion in a (small) fo­cus area, such as men­tal health or biose­cu­rity.

We would weakly recom­mend the former rather than the lat­ter, be­cause we already tried it with mod­er­ate suc­cess, and be­cause it al­lows start­ing im­me­di­ately with quan­ti­ta­tive mod­els of the short­listed or­gani­sa­tions (see be­low, “Fo­cus on quan­ti­ta­tive mod­els from the be­gin­ning”).

Shorter duration

Given the cir­cum­stances, we be­lieve the Pro­ject was too long. A shorter pro­ject means less is lost if the Pro­ject fails, and the closer dead­line could be mo­ti­vat­ing.

We would recom­mend one of two mod­els:

  • 1-month pro­ject with meet­ings and work ses­sions at intervals

  • 1 week re­treat, work­ing on the pro­ject full-time

Our most im­por­tant work, build­ing the ac­tual quan­ti­ta­tive mod­els and de­cid­ing on their in­puts, was done in about this amount of time. The large, early part of the pro­ject cen­ter­ing around learn­ing and search­ing for can­di­date or­ga­ni­za­tions, was marginally not very use­ful (see e.g. sec­tion “Con­tinued het­ero­gene­ity” above).

Use a smaller grant if it seems easier

We ini­tially be­lieved that the rel­a­tively large size of the grant (£10,000) would mo­ti­vate team mem­bers not only to work hard, but, more im­por­tantly, to be epistem­i­cally vir­tu­ous – that is, to fo­cus their efforts on ac­tions to im­prove the fi­nal al­lo­ca­tion rather than ones that felt fun or so­cially nor­ma­tive. We now be­lieve that this effect is small, and does not de­pend much on the size of the grant. For more in­for­ma­tion, see sec­tion “More gen­eral up­dates about epistemics, teams and com­mu­nity”.

Where the £10,000 figure may have helped is through get­ting us more and bet­ter ap­pli­cants by sig­nal­ling se­ri­ous in­tent. But we are very un­cer­tain about this con­sid­er­a­tion and would give it low weight.

Over­all, our view is that the benefits of a larger grant size are quite small, rel­a­tive to other strate­gic de­ci­sions, and that a £1000-2000 grant might have achieved nearly all the same benefits7. On the other hand, the true mon­e­tary cost of the grant is low (see above, “CEA money and time”). There­fore our ten­ta­tive recom­men­da­tion is: a £10,000 grant may still be worth get­ting, but don’t worry too much about it. Be up­front to the fun­der about the small effect of the money, and con­sider go­ing ahead any­way if they give you less.

Fo­cus on quan­ti­ta­tive mod­els from the beginning

One of the sur­pris­ing up­dates from the Pro­ject was that we made much more progress, in­clud­ing the less ex­pe­rienced team mem­bers, once we be­gan work­ing on ex­plicit quan­ti­ta­tive mod­els of spe­cific or­gani­sa­tions. (See sec­tion “More gen­eral up­dates about epistemics, teams and com­mu­nity”.) So we would recom­mend start­ing with quan­ti­ta­tive mod­els from the first day, even when they are very sim­ple. This may sound sur­pris­ing, but we urge you to try it8.

More ho­moge­nous team

We severely over­es­ti­mated the de­gree to which we could bring less ex­pe­rienced mem­bers of a het­ero­ge­neous team up to speed (see above, “Differ­ent lev­els of pre­vi­ous ex­pe­rience”). So we would recom­mend a ho­moge­nous team, with all mem­bers meet­ing a high thresh­old of ex­pe­rience.

Smaller team

We started with a model where, very roughly, peo­ple are ei­ther good, con­scien­tious team mem­bers, or they get de-mo­ti­vated and drop out of the team. Un­der this model, dropouts are not very costly. You lose a team mem­ber, but you also lose over­head in deal­ing with them. So that is a rea­son to have a big­ger team to start with. How­ever, what we ac­tu­ally ob­served is that de­mo­ti­vated peo­ple don’t like work­ing, but the one thing they dis­like more is drop­ping out. Without spec­u­lat­ing about the un­der­ly­ing rea­sons, a com­mon strat­egy while de­mo­ti­vated (that we our­selves have also been guilty of in the past) is to do the min­i­mal amount of work re­quired to avoid drop­ping out. Hence, fu­ture pro­jects should start with a small team of fully ded­i­cated mem­bers rather than a larger team hop­ing to provide some “buffer” should a team mem­ber drop out.

Hav­ing a smaller team also means ex­pend­ing more effort se­lect­ing team mem­bers; do­ing so should also help with the prob­lem raised in the above sub­sec­tion. Although it seemed to us that we had already put a lot of re­sources into find­ing the right team, we now see that much more could have been done here, for ex­am­ple:

  • a fully-fledged trial pe­riod; con­sist­ing of a big, top-down man­aged team from which the most promis­ing few mem­bers are then se­lected to go on (al­though we worry this could in­tro­duce nega­tive anti-co­op­er­a­tive norms and an un­pleas­ant at­mo­sphere).

  • mak­ing the ap­pli­ca­tion pro­cess the fol­low­ing: can­di­dates build a quan­ti­ta­tive model, re­ceive feed­back, and go back to sub­mit a sec­ond ver­sion; they are then eval­u­ated on the qual­ity of their models

More gen­eral up­dates about epistemics, teams, and community

We had sev­eral hy­pothe­ses about how a pro­ject like this would af­fect the mem­bers in­volved, as well as the larger effec­tive al­tru­ism com­mu­nity in which it took place. Here are some of our up­dates. Of course, the Pro­ject is only a sin­gle data point. Nonethe­less, we think it still car­ries ev­i­dence in the same sense that, say, an elab­o­rate field-study might be im­por­tant with­out be­ing an RCT.

The epistemic at­mo­sphere of a group will be more truth-seek­ing when a large dona­tion is con­di­tional on its perfor­mance.

Up­date: sub­stan­tially lower credence

There are at least two rea­sons why hu­man in­tel­lec­tual in­ter­ac­tion is of­ten not truth-seek­ing. First, there are con­flict­ing in­cen­tives. Th­ese can take the form of both in­ter­nal, cog­ni­tive bi­ases or ex­ter­nal, struc­tural in­cen­tives.

To some ex­tent the money helped re­al­ign in­cen­tives. The loom­ing £10,000 de­ci­sion pro­vided a Schel­ling point that could be used to end less use­ful dis­cus­sions or sub-pro­jects with­out the vi­o­la­tion of so­cial norms that this of­ten en­tails oth­er­wise.

Nonethe­less, the at­mo­sphere also suffered from many com­mon cog­ni­tive bi­ases. Th­ese in­cludes things like defer­ring too much to per­ceived au­thor­i­ties, and dis­cussing top­ics that one likes, feels com­fortable with or know many in­ter­est­ing facts about. It is pos­si­ble that the kind of dis­ci­plin­ing “skin in the game” effect we were hop­ing for failed to oc­cur since the grant was al­tru­is­tic, and of lit­tle rele­vance to team mem­bers per­son­ally. In re­sponse to this, team mem­bers also pledged a se­cret amount of their own money to the even­tual re­cip­i­ent (with pledges rang­ing from £0 to £200)9. It is difficult to dis­en­tan­gle the con­se­quences of this per­sonal de­ci­sion from the dona­tion at large, but it might still have suffered the same al­tru­is­tic prob­lem. Given how in­sen­si­tive peo­ple are to as­tro­nom­i­cal differ­ences in char­ity im­pact in gen­eral, the choice of which top char­ity one’s dona­tions go to might not make a suffi­cient psy­cholog­i­cal differ­ence to offset other in­cen­tives.

Se­cond, truth-seek­ing in­ter­ac­tion is partly difficult not be­cause of mis­al­igned in­cen­tives, but be­cause it re­quires cer­tain men­tal skills that have to be de­liber­ately trained (see also “Differ­ent lev­els of pre­vi­ous ex­pe­rience”). Find­ing the truth, in gen­eral, is hard. For ex­am­ple, we strongly en­couraged team mem­bers to cen­ter dis­cus­sions around cruxes, but most of­ten the cruxes mem­bers gave were not ac­tu­ally things that would change their minds about X, as op­posed to generic ev­i­dence re­gard­ing X or ev­i­dence that would clearly falsify X but that they strongly never ex­pected to be found. This was true for ba­si­cally all mem­bers of the Pro­ject, of­ten in­clud­ing our­selves.

In­stead of the loom­ing, large dona­tion, the epistemic at­mo­sphere seems to have been pos­i­tively im­pacted by things like guid­ing dis­agree­ments in re­la­tion to which quan­ti­ta­tive model in­put they would change, and work­ing within a strict time limit (e.g. a set meet­ing end­ing). For more on this, see the sec­tion “Fo­cus on quan­ti­ta­tive mod­els from the be­gin­ning”.

A ma­jor risk to the pro­ject is peo­ple hold on too strongly to their pre-Pro­ject views

Up­date: lower credence

We nick­named this the “pet char­i­ties” prob­lem: par­ti­ci­pants start the pro­ject with some views about which grantees are most cost-effec­tive, and see them­selves as hav­ing to defend that view. They en­gage with con­trary ev­i­dence, but only to ar­gue against it, or to find some rea­son their origi­nal grantee is still su­pe­rior.

This was hardly a prob­lem, but some­thing in the vicinity was. While peo­ple didn’t strongly defend a view, this was mostly be­cause they didn’t feel com­fortable en­gag­ing with com­pet­ing views at all. In­stead, par­ti­ci­pants strongly preferred to con­tinue re­search­ing the area they already knew and cared most about, even as other par­ti­ci­pants were do­ing the same thing with a differ­ent area. Par­ti­ci­pants’ differ­ent choice of area im­plied dis­agree­ing premises, but they proved ex­tremely re­luc­tant to at­tempt to re­solve this dis­agree­ment. We might call this the “pet ar­eas” prob­lem or the prob­lem of “lower bound prop­a­ga­tion”. (Be­cause par­ti­ci­pants may in­for­mally be us­ing the heuris­tic: “con­sider only in­ter­ven­tions bet­ter than X”, with very differ­ent Xs).

Another prob­lem that proved big­ger than pet char­i­ties was over-up­dat­ing on au­thor­ity opinion (such as Tom’s cur­rent rank­ing of grantees). We see this as linked with the lack of com­fort or con­fi­dence men­tioned above.

A large ma­jor­ity of team ap­pli­cants would be peo­ple we know per­son­ally.

Up­date: false

We’re both so­cially close to the EA com­mu­nity in Oxford. We ex­pected to more or less know all ap­pli­cants per­son­ally: if some­one was in­ter­ested enough in EA to ap­ply, we would have come across them some­how.

In­stead, a large num­ber of ap­pli­ca­tions were from peo­ple we didn’t know at all, a few of which ended up be­ing se­lected for the team. We up­date that, at least in Oxford, there are many “lurk­ers”: peo­ple who are in­ter­ested in EA, but find the cur­rent offer­ings of the lo­cal group un­in­spiring, so that they don’t get in­volved at all. There ap­pear to be many tal­ented peo­ple who are only pre­pared to work on an EA pro­ject if it stands out to them as par­tic­u­larly in­ter­est­ing. Although we gen­er­ally would ad­vise cau­tion, this could be one rea­son to be more op­ti­mistic about repli­ca­tions of the Pro­ject.

  1. A use­ful dis­tinc­tion is be­tween ‘con­sid­er­a­tions’-type ar­gu­ments and ‘weigh­ing’-type ar­gu­ments. Con­sid­er­a­tions-type ar­gu­ments con­tain new facts or rea­son­ing that should shift our views, other things be­ing equal, in a cer­tain di­rec­tion. Some­times, in ad­di­tion to the di­rec­tion of the shift, these ar­gu­ments give an in­tu­itive idea of its mag­ni­tude. Weigh­ing-type ar­gu­ments, on the other hand, take ex­ist­ing con­sid­er­a­tions and use them to ar­rive at an all-things-con­sid­ered view. The mag­ni­tude of differ­ent effects is ex­plic­itly weighed. Con­sid­er­a­tions-type ar­gu­ments in­volve fewer ques­tion­able judge­ment calls and more con­cep­tual nov­elty, which is one rea­son we be­lieve they are over­sup­plied rel­a­tive to weigh­ing-type ar­gu­ments. While Tom be­lieved this suffi­ciently strongly to con­tribute to mo­ti­vat­ing him to launch the Pro­ject, we both agree that this is some­thing rea­son­able peo­ple can dis­agree about. On a draft of this piece, Max Dal­ton wrote: “They also tend to pro­duce shifts in view that are less sig­nifi­cant, both in the sense of less rev­olu­tion­ary, and in the sense of the changes tend­ing to have less im­pact. This is partly be­cause weigh­ing-type ar­gu­ments are more com­monly used in cases where you’re pick­ing be­tween two good op­tions. Be­cause I think weigh­ing-type ar­gu­ments tend to be lower-im­pact, I’m not sure I agree with your con­clu­sion. My view here is pretty low-re­silience.”

  2. To be clear, how­ever, there are team mem­bers who are se­ri­ously con­sid­er­ing that ca­reer path.

  3. Tom’s guess: 15% chance this per­son goes into pri­ori­ti­sa­tion re­search. Con­di­tional on him or her do­ing so, a ~30% chance we caused it.

  4. We also had a safe­guard in place to avoid the money be­ing granted to an ob­vi­ously poor or­gani­sa­tion, in case the pro­ject went dan­ger­ously off the rails: Owen Cot­ton-Bar­ratt had veto power on the fi­nal grant (al­though he says he would have been re­luc­tant to use it).

  5. The Fermi ques­tion was too easy in that it didn’t help dis­crim­i­nate be­tween top ap­pli­cants, and that the re­search pro­posal ques­tion was too vague and should have re­quired more speci­fics.

  6. Ja­cob adds: “It should be em­pha­sized that we are dis­re­gard­ing any gen­eral in­tel­lec­tual progress here. It is plau­si­ble that sev­eral team mem­bers learned new con­cepts and prac­ticed crit­i­cal think­ing, and as a re­sult grew in­tel­lec­tu­ally from the pro­ject – just not in a di­rec­tion and ex­tent that would help with global pri­ori­ti­sa­tion work in par­tic­u­lar.”

  7. More goal flex­i­bil­ity, ear­lier on, would have been good. We had am­bi­tious goals for the Pro­ject, which we de­scribed pub­li­cly on­line, and in con­ver­sa­tions with fun­ders and oth­ers we re­spect. In at­tempt­ing to achieve these goals, we be­lieve we were quite flex­ible and cre­ative, try­ing many differ­ent ap­proaches. But we were too rigid about the (ul­ti­mately in­stru­men­tal) Pro­ject goals. Partly, we felt that chang­ing them would be an em­bar­rass­ment; we avoided do­ing so be­cause it would have been painful in the short run. But it seems clear now that we could have bet­ter achieved our ter­mi­nal goals by mod­ify­ing the Pro­ject’s goals.

  8. There is sig­nifi­cant dis­agree­ment about this among peo­ple we’ve dis­cussed it with, on and off the team. We note that be­ing more heav­ily in­volved in the Pro­ject seems to cor­re­late with be­liev­ing that a low grant would have achieved most of the benefits. Out­siders tend to be­lieve that more money is good, while we who led the Pro­ject be­lieve the effects are small. (A mid­dle ground of £5000 has tended to pro­duce some more agree­ment.) Peo­ple we re­spect dis­agree with us, which you should take into ac­count when form­ing your own view.

  9. Ja­cob adds: “One of our most pro­duc­tive and in­sight­ful ses­sions was when we spent about six hours de­cid­ing the fi­nal in­puts into the mod­els. It is plau­si­ble that this sin­gle ses­sion was equally in­tel­lec­tu­ally pro­duc­tive and de­ci­sion-guid­ing as the first few weeks of the pro­ject com­bined.”

  10. A team mem­ber com­ments that “skin in the game”-effects may en­courage avoid­ing bad out­comes more than work­ing ex­tra hard for good out­comes: “Sub­jec­tively, I felt it as a con­stant ‘safety net’ to know that we’d most likely give to a good char­ity that ends up be­ing in a cer­tain range of un­cer­tainty that the ex­perts con­cede, and that it was al­most im­pos­si­ble for us to blow 10,000GBP on some­thing that would be any­where near low im­pact”.

Writ­ten on Oc­to­ber 12, 2017