Intellectual Progress Inside and Outside Academia

This post is taken from a re­cent face­book con­ver­sa­tion that in­cluded Wei Dai, Eliezer Yud­kowsky, Vladimir Slep­nev, Stu­art Arm­strong, Maxim Kesin, Qiaochu Yuan and Robby Bens­inger, about the abil­ity of academia to do the key in­tel­lec­tual progress re­quired in AI al­ign­ment.

[The above peo­ple all gave per­mis­sion to have their com­ments copied here. Some com­menters re­quested their replies not be made pub­lic, and their com­ment threads were not copied over.]

Ini­tial Thread

Wei Dai:

Eliezer, can you give us your take on this dis­cus­sion be­tween me, Vladimir Slep­nev, and Stu­art Arm­strong? I’m es­pe­cially in­ter­ested to know if you have any thoughts on what is pre­vent­ing academia from tak­ing or even rec­og­niz­ing cer­tain steps in in­tel­lec­tual progress (e.g., in­vent­ing any­thing re­sem­bling Bit­coin or TDT/​UDT) that non-aca­demics are ca­pa­ble of. What is go­ing on there and what do we need do to avoid pos­si­bly suffer­ing the same fate? See this and this.

Eliezer Yud­kowsky:

It’s a deep is­sue. But stat­ing the ob­vi­ous is of­ten a good idea, so to state the ob­vi­ous parts, we’re look­ing at a lot of prin­ci­pal-agent prob­lems, Good­hart’s Law, bad sys­temic in­cen­tives, hy­per­com­pe­ti­tion crowd­ing out vol­un­tary con­tri­bu­tions of real work, the blind lead­ing the blind and sec­ond-gen­er­a­tion illiter­acy, etcetera. There just isn’t very much in the aca­demic sys­tem that does pro­mote any kind of real work get­ting done, and a lot of other re­wards and in­cen­tives in­stead. If you wanted to get pro­duc­tive work done in­side academia, you’d have to ig­nore all the in­cen­tives point­ing el­se­where, and then you’d (a) be lead­ing a hor­rible un­re­warded life and (b) you would fall off the hy­per­com­pet­i­tive fron­tier of the ma­jor jour­nals and (c) no­body else would be par­tic­u­larly in­cen­tivized to pay at­ten­tion to you ex­cept un­der un­usual cir­cum­stances. Academia isn’t about knowl­edge. To put it an­other way, al­though there are deep things to say about the way in which bad in­cen­tives arise, the skills that are lost, the par­tic­u­lar fal­la­cies that arise, and so on, it doesn’t feel to me like the *ob­vi­ous* bad in­cen­tives are in­ad­e­quate to ex­plain the ob­ser­va­tions you’re point­ing to. Un­less there’s some kind of psy­cholog­i­cal block pre­vent­ing peo­ple from see­ing all the ob­vi­ous sys­temic prob­lems, it doesn’t feel like the end re­sult ought to be sur­pris­ing.

Of course, a lot of peo­ple do seem to have trou­ble see­ing what I’d con­sider to be ob­vi­ous sys­temic prob­lems. I’d chalk that up to not as much fluency with Moloch’s toolbox, plus them not be­ing sta­tus-blind and as­sign­ing non-zero pos­i­tive sta­tus to academia that makes them emo­tion­ally re­luc­tant to cor­rectly take all the ob­vi­ous prob­lems at face value.

Eliezer Yud­kowsky (cont.):

It seems to me that I’ve watched or­ga­ni­za­tions like OpenPhil try to spon­sor aca­demics to work on AI al­ign­ment, and it seems to me that they just can’t pro­duce what I’d con­sider to be real work. The jour­nal pa­per that Stu­art Arm­strong coau­thored on “in­ter­rupt­ibil­ity” is a far step down from Arm­strong’s other work on cor­rigi­bil­ity. It had to be dumbed way down (I’m count­ing ob­scu­ra­tion with fancy equa­tions and math re­sults as “dumb­ing down”) to be pub­lished in a main­stream jour­nal. It had to be stripped of all the caveats and any men­tion of ex­plicit in­com­plete­ness, which is nec­es­sary meta-in­for­ma­tion for any on­go­ing in­cre­men­tal progress, not to men­tion im­por­tant from a safety stand­point. The root cause can be de­bated but the ob­serv­able seems plain. If you want to get real work done, the ob­vi­ous strat­egy would be to not sub­ject your­self to any aca­demic in­cen­tives or bu­reau­cratic pro­cesses. Par­tic­u­larly in­clud­ing peer re­view by non-”hob­by­ists” (peer com­men­tary by fel­low “hob­by­ists” still be­ing po­ten­tially very valuable), or re­view by grant com­mit­tees staffed by the sort of peo­ple who are still im­pressed by aca­demic sage-cos­tum­ing and will want you to com­pete against pointlessly ob­scured but ter­ribly se­ri­ous-look­ing equa­tions.

Eliezer Yud­kowsky (cont.):

There’s a lot of de­tailed sto­ries about good prac­tice and bad prac­tice, like why mailing lists work bet­ter than jour­nals be­cause of that thing I wrote on FB some­where about why you ab­solutely need 4 lay­ers of con­ver­sa­tion in or­der to have real progress and jour­nals do 3 lay­ers which doesn’t work. If you’re ask­ing about those it’s a lot of lit­tle long sto­ries that add up.

Subthread 1

Wei Dai:

Academia is ca­pa­ble of many deep and im­por­tant re­sults though, like com­plex­ity the­ory, pub­lic-key cryp­tog­ra­phy, zero knowl­edge proofs, vNM and Sav­age’s de­ci­sion the­o­ries, to name some that I’m fa­mil­iar with. It seems like we need a the­ory that ex­plains why it’s able to take cer­tain kinds of steps but not oth­ers, or maybe why the situ­a­tion has got­ten a lot worse in re­cent decades.

That academia may not be able to make progress on AI al­ign­ment is some­thing that wor­ries me and a ma­jor rea­son for me to be con­cerned about this is­sue now. If we had a bet­ter, more nu­anced the­ory of what is wrong with academia, that would be use­ful for guid­ing our own ex­pec­ta­tions on this ques­tion and per­haps also help per­suade peo­ple in charge of or­ga­ni­za­tions like OpenPhil.

Qiaochu Yuan:

Public-key cryp­tog­ra­phy was in­vented by GCHQ first, right?

Wei Dai:

It was in­de­pen­dently rein­vented by academia, with only a short de­lay (4 years ac­cord­ing to Wikipe­dia) us­ing much less re­sources com­pared to the gov­ern­ment agen­cies. That seems good enough to illus­trate my point that academia is (or at least was) ca­pa­ble of do­ing good and effi­cient work.

Qiaochu Yuan:

Fair point.

I’m a lit­tle con­cerned about the use of the phrase “academia” in this con­ver­sa­tion not cut­ting re­al­ity at the joints. Academia may sim­ply not be very ho­mo­ge­neous over space and time—it cer­tainly seems strange to me to lump von Neu­mann in with ev­ery­one else, for ex­am­ple.

Wei Dai:

Sure, part of my ques­tion here is how to bet­ter carve re­al­ity at the joints. What’s the rele­vant differ­ence be­tween the parts (in space and/​or time) of academia that are pro­duc­tive and the parts that are not?

Stu­art Arm­strong:

Academia is of­ten pro­duc­tive. I think the challenge is mainly get­ting it to be pro­duc­tive on the right prob­lems.

Wei Dai:

In­ter­est­ing, so maybe a bet­ter way to frame my ques­tion is, of the times that academia man­aged to fo­cus on the right prob­lems, what was re­spon­si­ble for that? Or, what is caus­ing academia to not be able to fo­cus on the right prob­lems in cer­tain fields now?

Subthread 2

Eliezer Yud­kowsky:

Things have cer­tainly got­ten a lot worse in re­cent decades. There’s var­i­ous sto­ries I’ve the­o­rized about that but the pri­mary fact seems pretty blatant. Things might be differ­ent if we had the re­searchers and in­cen­tives from the 1940s, but mod­ern aca­demics are only slightly less likely to sprout wings than to solve real al­ign­ment prob­lems as op­posed to fake ones. They’re still the same peo­ple and the same in­cen­tive struc­ture that ig­nored the en­tire is­sue in the first place.

OpenPhil is bet­ter than most fund­ing sources, but not close to ad­e­quate. I model them as hav­ing not seen past the pre­tend. I’m not sure that more nu­anced the­o­ries are what they need to break free. Sure, I have a dozen the­o­ries about var­i­ous fac­tors. But ul­ti­mately, most hu­man in­sti­tu­tions through his­tory haven’t solved hard men­tal prob­lems. Ask­ing why mod­ern academia doesn’t UDT may be like ask­ing why JC Pen­ney doesn’t. It’s just not set up to do that. No­body is be­ing docked a bonus for writ­ing pa­pers about CDT in­stead. Feel­ing wor­ried and like some­thing is out of place about the Col­lege of Car­di­nals in the Catholic Church not in­vent­ing cryp­tocur­ren­cies, sug­gests a ba­sic men­tal ten­sion that may not be cured by more nu­anced the­o­ries of the so­ciol­ogy of re­li­gion. Suc­cess is un­usual and calls for ex­pla­na­tion, failure doesn’t. Academia in a few col­leges in a few coun­tries used to be in a weird regime where it could solve hard prob­lems, times changed, it fell out of that weird place.

Rob Bens­inger:

It’s not ac­tu­ally clear to me, even af­ter all this dis­cus­sion, that 1940s re­searchers had sig­nifi­cantly bet­ter core men­tal habits /​ mind­sets for al­ign­ment work than 2010s re­searchers. A few counter-points:

  • A lot of the best minds worked on QM in the early 20th cen­tury, but I don’t see clear ev­i­dence that QM pro­gressed differ­ently than AI is pro­gress­ing to­day; that is, I don’t know of a clear case that falsifies the hy­poth­e­sis “all the differ­ences in out­put are due to AI and QM as cog­ni­tive prob­lems hap­pen­ing to in­volve in­her­ently differ­ent kinds and de­grees of difficulty”. In both cases, it seems like peo­ple did a good job of ap­ply­ing con­ven­tional sci­en­tific meth­ods and oc­ca­sion­ally achiev­ing con­cep­tual break­throughs in con­ven­tional sci­en­tific ways; and in both cases, it seems like there’s a huge amount of miss­ing-the-for­est-for-the-trees, not-se­ri­ously-think­ing-about-the-im­pli­ca­tions-of-be­liefs, and gen­er­ally-ap­proach­ing-philos­o­phy­ish-ques­tions-flip­pantly. It took some­thing like 50 years to go from “Schrod­inger’s cat is weird” to “OK /​maybe/​ macro­scopic su­per­po­si­tion-ish things are real” in physics, and “maybe macro­scopic su­per­po­si­tion-ish things are real” strikes me as much more ob­vi­ous and much less de­mand­ing of sus­tained the­o­riz­ing than, e.g., ‘we need to pri­ori­tize de­ci­sion the­ory re­search ASAP in or­der to pre­vent su­per­in­tel­li­gent AI sys­tems from de­stroy­ing the world’. Even von Neu­mann had non-nat­u­ral­ist views about QM, and if von Neu­mann is a symp­tom of in­tel­lec­tual de­gen­er­acy then I don’t know what isn’t.

  • Ditto for the de­vel­op­ment of nu­clear weapons. I don’t see any clear ex­am­ples of qual­i­ta­tively bet­ter fore­cast­ing, strat­egy, out­side-the-box think­ing, or sci­en­tific pro­duc­tivity on this topic in e.g. the 1930s, com­pared to what I’d ex­pect see to­day. (Though this com­par­i­son is harder to make be­cause we’ve ac­cu­mu­lated a lot of knowl­edge and hard ex­pe­rience with tech­nolog­i­cal GCR as a re­sult of this and similar cases.) The near-suc­cess of the se­crecy effort might be an ex­cep­tion, since that took some loner agency and co­or­di­na­tion that seems harder to imag­ine to­day. (Though that might also have been made eas­ier by the smaller and less in­ter­na­tion­al­ized sci­en­tific com­mu­nity of the day, and by the fact that world war was on ev­ery­one’s radar?)

  • Tur­ing and I. J. Good both had enough puz­zle pieces to do at least a lit­tle se­ri­ous think­ing about al­ign­ment, and there was no par­tic­u­lar rea­son for them not to do so. The 1956 Dart­mouth work­shop shows “maybe true AI isn’t that far off” was at least taken some­what se­ri­ously by a fair num­ber of peo­ple (though his­to­ri­ans tend to over­state the ex­tent to which this was true). If 1940s re­searchers were dra­mat­i­cally bet­ter than 2010s re­searchers at this kind of thing, and the de­cay af­ter the 1940s wasn’t in­stan­ta­neous, I’d have ex­pected at least a hint of se­ri­ous think­ing-for-more-than-two-hours about al­ign­ment from at least one per­son work­ing in the 1950s-1960s (if not ear­lier).

Rob Bens­inger:

Here’s a differ­ent hy­poth­e­sis: Hu­man brains and/​or all of the 20th cen­tury’s stan­dard sci­en­tific toolboxes and norms are just re­ally bad at philo­soph­i­cal/​con­cep­tual is­sues, full stop. We’re bad at it now, and we were roughly equally bad at it in the 1940s. A lot of fields have slowed down be­cause we’ve plucked most of the low-hang­ing fruit that doesn’t re­quire deep philo­soph­i­cal/​con­cep­tual in­no­va­tion, and AI in par­tic­u­lar hap­pens to be an area where the things hu­man sci­en­tists have always been worst at are es­pe­cially crit­i­cal for suc­cess.

Wei Dai:

Ok, so the story I’m form­ing in my mind is that we’ve always been re­ally bad at philo­soph­i­cal/​con­cep­tual is­sues, and past philo­soph­i­cal/​con­cep­tual ad­vances just rep­re­sent very low-hang­ing fruit that have been picked. When we in­vented mailing lists /​ blogs, the ad­van­tage over tra­di­tional aca­demic com­mu­ni­ca­tions al­lowed us to reach a lit­tle higher and pick up a few more fruits but progress is still very limited be­cause we’re still not able to reach very high in an ab­solute sense, and mak­ing progress this way de­pends on gath­er­ing to­gether enough hob­by­ists with the right in­ter­ests and re­sources which is a rare oc­cur­rence. Rob, I’m not sure how much of this you en­dorse, but it seems like the best ex­pla­na­tion of all the rele­vant facts I’ve seen so far.

Rob Bens­inger:

I think the ob­ject-level philo­soph­i­cal progress via mailing lists /​ blogs was tied to com­ing up with some good philo­soph­i­cal method­ol­ogy. One sim­ple nar­ra­tive about the global situ­a­tion (pretty close to the stan­dard nar­ra­tive) is that be­fore 1880 or so, hu­man in­quiry was good at ex­plor­ing weird non­stan­dard hy­pothe­ses, but bad at rigor­ously de­mand­ing testa­bil­ity and pre­ci­sion of those hy­pothe­ses. Hu­man in­quiry be­tween roughly 1880 and 1980 solved that prob­lem by de­mand­ing testa­bil­ity and pre­ci­sion in all things, which (com­bined with pro­saic knowl­edge ac­cu­mu­la­tion) let them grab a lot of low-hang­ing sci­en­tific fruit re­ally fast, but caused them to be un­nec­es­sar­ily slow at ex­plor­ing any new per­spec­tives that weren’t 100% ob­vi­ously testable and pre­cise in a cer­tain naive sense (which led to lack-of-se­ri­ous-in­quiry into “weird” ques­tions at the edges of con­ven­tional sci­en­tific ac­tivi­ties, like MWI and New­comb’s prob­lem).

Bayesi­anism, the cog­ni­tive rev­olu­tion, the slow fade of pos­i­tivism’s in­fluence, the ran­dom walk of aca­demic one-up­man­ship, etc. even­tu­ally led to more so­phis­ti­ca­tion in var­i­ous quar­ters about what kind of testa­bil­ity and pre­ci­sion are im­por­tant by the late 20th cen­tury, but this pro­cess of syn­the­siz­ing ‘ex­plore weird non­stan­dard hy­pothe­ses’ with ‘de­mand testa­bil­ity and pre­ci­sion’ (which are the two crit­i­cal pieces of the puz­zle for ‘do un­usu­ally well at philos­o­phy/​fore­cast­ing/​etc.’) was very un­even and slow. Thus you get var­i­ous lit­tle is­lands of es­pe­cially good philos­o­phy-ish think­ing show­ing up at roughly the same time here and there, in­clud­ing parts of an­a­lytic philos­o­phy (e.g., Drescher), mailing lists (e.g., Ex­tropi­ans), and psy­chol­ogy (e.g., Tet­lock).

Subthread 3

Vladimir Slep­nev:

Eliezer, your po­si­tion is very sharp. A cou­ple ques­tions then:

  1. Do you think e.g. Scott Aaron­son’s work on quan­tum com­put­ing to­day falls out­side the “weird regime where it could solve hard prob­lems”?

  2. Do you have a clear un­der­stand­ing why e.g. Nick Bostrom isn’t ex­cited about TDT/​UDT?

Wei Dai:

Vladimir, can you clar­ify what you mean by “isn’t ex­cited”? Nick did write a few para­graphs about the rele­vance of de­ci­sion the­ory to AI al­ign­ment in his Su­per­in­tel­li­gence, and cited TDT and UDT as “newer can­di­dates [...] which are still un­der de­vel­op­ment”. I’m not sure what else you’d ex­pect, given that he hasn’t spe­cial­ized in de­ci­sion the­ory in his philos­o­phy work? Also, what’s your own view of what’s caus­ing academia to not be able to make these “out­sider steps”?

Vladimir Slep­nev:

Wei, at some point you thought of UDT as the solu­tion to an­thropic rea­son­ing, right? That’s Bostrom’s spe­cialty. So if you are right, I’d ex­pect more than a sin­gle su­perfi­cial men­tion.

My view is that academia cer­tainly tends to go off in wrong di­rec­tions and it was always like that. But its di­rec­tion can be in­fluenced with enough effort and un­der­stand­ing, it’s been done many times, and the benefits of do­ing that are too great to over­look.

Wei Dai:

I’m not sure, maybe he hasn’t looked into UDT closely enough to un­der­stand the rele­vance to an­throp­ics or he’s com­mit­ted to a prob­a­bil­ity view? Prob­a­bly Stu­art has a bet­ter idea of this than I do. Oh, I do re­call that when I at­tended a work­shop at FHI, he asked me some ques­tions about UDT that seemed to in­di­cate that he didn’t un­der­stand it very well. I’m guess­ing he’s prob­a­bly just too busy to do ob­ject-level philo­soph­i­cal in­ves­ti­ga­tions these days.

Can you give some past ex­am­ples of academia go­ing off in the wrong di­rec­tion, and that be­ing fixed by out­siders in­fluenc­ing its di­rec­tion?

Vladimir Slep­nev:

Why do you need the “fixed by out­siders” bit? I think it’s eas­ier to change the di­rec­tion of academia while be­ing in academia, and that’s been done many times.

Maxim Kesin:

Vladimir Slep­nev The price of ad­mis­sion is pretty high for peo­ple who can do oth­er­wise pro­duc­tive work, no? Espe­cially since very few mem­bers of the club can have di­rec­tion-chang­ing im­pact. Some­thing like find­ing and con­vinc­ing ex­ist­ing high-stand­ing mem­bers, prefer­ably sev­eral of them seems like a bet­ter strat­egy than join­ing the club and do­ing it from the in­side your­self.

Wei Dai:

Vladimir, on LW you wrote “More like a sub­set of steps in each field that need to be done by out­siders, while both pre­ced­ing and fol­low­ing steps can be done by academia.” If some aca­demic field is go­ing in a wrong di­rec­tion be­cause it’s miss­ing a step that needs to be done by out­siders, how can some­one in academia change its di­rec­tion? I’m con­fused… Are you say­ing out­siders should go into academia in or­der to change its di­rec­tion, af­ter tak­ing the miss­ing “out­sider steps”? Or that there is no di­rect past ev­i­dence that out­siders can change academia’s di­rec­tion but there’s ev­i­dence that in­sid­ers can and that serves as bayesian ev­i­dence that out­siders can too? Or some­thing else?

Vladimir Slep­nev:

I guess I shouldn’t have called them “out­sider steps”, more like “new­comer steps”. Does that make sense?

Eliezer Yud­kowsky:

There’s an old ques­tion, “What does the Bible God need to do for the Chris­ti­ans to say he is not good?” What would academia need to do be­fore you let it go?

Vladimir Slep­nev:

But I don’t feel abused! My in­ter­ac­tions with academia have been quite pleas­ant, and read­ing pa­pers usu­ally gives me nice sur­prises. When I read your nega­tive com­ments about academia, I mostly just get con­fused. At least from what I’ve read in this dis­cus­sion to­day, it seems like the mys­ti­cal force that’s stop­ping peo­ple like Bostrom from go­ing fully on board with ideas like UDT is sim­ple mis­com­mu­ni­ca­tion on our part, not any­thing more sinister. If our ar­gu­ments for us­ing de­ci­sions over prob­a­bil­ities aren’t con­vinc­ing enough, per­haps we should work on them some more.

Wei Dai:

Vladimir, surely those aca­demic fields have had plenty of in­fu­sion of new­com­ers in the form of new Ph.D. stu­dents, but the miss­ing steps only got done when peo­ple tried do them while re­main­ing en­tirely out of academia. Are you sure the rele­vant fac­tor here is “new to the field” rather than “do­ing work out­side of academia”?

Stu­art Arm­strong:

Aca­demic fields are of­ten pro­duc­tive, but nar­row. Say­ing “we should use de­ci­sion the­ory in­stead of prob­a­bil­ity to deal with an­throp­ics” falls out­side of most of the rele­vant fields, so few aca­demics are in­ter­ested, be­cause it doesn’t solve the prob­lems they are work­ing on.

Wei Dai:

Vladimir, a lot of peo­ple on LW didn’t have much trou­ble un­der­stand­ing UDT as in­for­mally pre­sented there, or rec­og­niz­ing it as a step in the right di­rec­tion. If join­ing academia makes some­body much less able to rec­og­nize progress in de­ci­sion the­ory, that seems like a bad thing and we shouldn’t be en­courag­ing peo­ple to do that (at least un­til we figure out what ex­actly is caus­ing the prob­lem and how to fix or avoid it on an in­sti­tu­tional or in­di­vi­d­ual level).

Vladimir Slep­nev:

I think it’s not sur­pris­ing that many LWers agreed with UDT, be­cause most of them were in­tro­duced to the topic by Eliezer’s post on New­comb’s prob­lem, which framed the prob­lem in a way that em­pha­sized de­ci­sions over prob­a­bil­ities. (Eliezer, if you’re listen­ing, that post of yours was the sin­gle best ex­am­ple of per­sua­sion I’ve seen in my life, and for a good goal too. Cheers!) So there’s prob­a­bly no statis­ti­cal effect say­ing out­siders are bet­ter at grasp­ing UDT on av­er­age. It’s not that academia is lack­ing some de­ci­sion the­ory skill, they just haven’t bought our fram­ing yet. When/​if they do, they will be uniquely good at dig­ging into this idea, just as with many other ideas.

If the above is true, then re­fus­ing to pay the fixed cost of get­ting our ideas into academia seems clearly wrong. What do you think?

Subthread 4

Stu­art Arm­strong:

Think the prob­lem is a mix of spe­cial­i­sa­tion and lack of ur­gency. If I’d been will­ing to adapt to the for­mat, I’m sure I could have got my old pro-SIA ar­gu­ments pub­lished. But an­throp­ics wasn’t ready for a “ig­nore the big prob­a­bil­ity de­bates you’ve been hav­ing; an­thropic prob­a­bil­ity doesn’t ex­ist” pa­per. And those were in­ter­ested in the fun­da­men­tal in­ter­play be­tween prob­a­bil­ity and de­ci­sion the­ory weren’t in­ter­ested in an­throp­ics (and I wasn’t will­ing to put the effort in to trans­late it into their lan­guage).

This is where the lack of ur­gency comes in. Peo­ple found the pa­per in­ter­est­ing, I’d wa­ger, but not say­ing any­thing about the ques­tions they were in­ter­ested in. And they had no real feel­ing that some ques­tions were far more im­por­tant than theirs.

Stu­art Arm­strong:

I’ve pre­sented the idea to Nick a few times, but he never seemed to get it fully. It’s hard to ig­nore prob­a­bil­ities when you’ve spent your life with them.

Eliezer Yud­kowsky:

I will men­tion for what­ever it’s worth that I don’t think de­ci­sion the­ory can elimi­nate an­throp­ics. That’s an in­tu­ition I still find cred­ible and it’s pos­si­ble Bostrom felt the same. I’ve also seen Bostrom con­tribute at least one de­ci­sion the­ory idea to an­thropic prob­lems, dur­ing a con­ver­sa­tion with him by in­stant mes­sen­ger, a di­vi­sion-of-re­spon­si­bil­ity prin­ci­ple that UDT later ren­dered re­dun­dant.

Stu­art Arm­strong:

I also dis­agree with Eliezer about the use of the “in­ter­rupt­ible agents” pa­per. The math is fun but ul­ti­mately pointless, and there is lit­tle men­tion of AI safety. How­ever, it was im­mensely use­ful for me to write that pa­per with Lau­rent, as it taught me so much about how to model things, and how to try and trans­late those mod­els into things that ML peo­ple like. As a con­se­quence, I can now de­sign in­differ­ence meth­ods for prac­ti­cally any agent, which was not the case be­fore.

And of course the pa­per wouldn’t men­tion the hard AI safety prob­lems—not enough peo­ple in ML are work­ing on those. The aim was to 1) pre­sent part of the prob­lem, 2) pre­sent part of the solu­tion, and 3) get both of those suffi­ciently ac­cepted that harder ver­sions of the prob­lem can then be phrased as “take known prob­lem/​solu­tion X, and add an ex­tra as­sump­tion...”

Rob Bens­inger:

That ra­tio­nale makes sense to me. I think the con­cern is: if the most visi­ble and widely dis­cussed pa­pers in AI al­ign­ment con­tinue to be ones that de­liber­ately ob­scure their own sig­nifi­cance in var­i­ous ways, then the benefits from the slow build-up to be­ing able to clearly ar­tic­u­late our ac­tual views in main­stream out­lets may be out­weighed by the costs from many other re­searchers in­ter­nal­iz­ing the wrong take-aways in the in­ter­ven­ing time. This is par­tic­u­larly true if many differ­ent build-ups like this are oc­cur­ring si­mul­ta­neously, over many years of in­cre­men­tal progress to­ward just com­ing out and say­ing what we ac­tu­ally think.

I think this is a hard prob­lem, and one MIRI’s re­peat­edly had to deal with. Very few of MIRI’s aca­demic pub­li­ca­tions even come close to giv­ing a full ra­tio­nale for why we care about a given topic or re­sult. The con­cern is with mak­ing it stan­dard prac­tice for high-visi­bil­ity AI al­ign­ment pa­pers to be at least some­what mis­lead­ing (in or­der to get wider at­ten­tion, meet less re­sis­tance, get pub­lished, etc.), rather than with the in­ter­rupt­ibil­ity pa­per as an iso­lated case; and this seems like a larger prob­lem for over­state­ments of sig­nifi­cance than for un­der­state­ments.

I don’t know how best to ad­dress this prob­lem. Two ap­proaches MIRI has tried be­fore, which might help FHI nav­i­gate this, are: (1) writ­ing a short ver­sion of the pa­per for pub­li­ca­tion that doesn’t fully ex­plain the AI safety ra­tio­nale, and a longer eprint of the same pa­per that does ex­plain the ra­tio­nale; and/​or (2) ex­plain­ing re­sults’ sig­nifi­cance more clearly and can­didly in the blog post an­nounc­ing the pa­per.

Subthread 5

Eliezer Yud­kowsky:

To put this yet an­other way, most hu­man bu­reau­cra­cies and big or­ga­ni­za­tions don’t do sci­ence. They have in­cen­tives for the peo­ple in­side them which get them to do things other than sci­ence. For ex­am­ple, in the FBI, in­stead of do­ing sci­ence, you can best ad­vance your ca­reer by clos­ing big-name mur­der cases… or what­ever. In the field of psy­chol­ogy, in­stead of do­ing sci­ence, you can get a lot of un­der­grad­u­ates into a room and sub­mit ob­scured-math im­pres­sive-sound­ing pa­pers with a bunch of ta­bles that claim a p-value greater than 0.05. Among the ways we know that this has lit­tle to do with sci­ence is that the pa­pers don’t repli­cate. P-val­ues are rit­u­als[1], and be­ing sur­prised that the rit­u­als don’t go hand-in-hand with sci­ence says you need to ad­just your in­tu­itions about what is sur­pris­ing. It’s like be­ing sur­prised that your prayers aren’t cur­ing can­cer and ask­ing how you need to pray differ­ently.

Now, it may be that sep­a­rately from the stan­dard in­cen­tives, decades later, a few heroes get to­gether and try to repli­cate some of the most pres­ti­gious pa­pers. They are do­ing sci­ence. Maybe some­body in­side the FBI is also do­ing sci­ence. Lots of peo­ple in Chris­tian re­li­gious or­ga­ni­za­tions, over the last few cen­turies, did some sci­ence, though fewer now than be­fore. Maybe the pub­lic even lauded the sci­ence they did, and they got some re­wards. It doesn’t mean the Catholic Church is set up to teach peo­ple how to do real sci­ence, or that this is the pri­mary way to get ahead in the Catholic Church such that sta­tus-seek­ers will be driven to seek their pro­mo­tions by do­ing great sci­ence.

The peo­ple do­ing real sci­ence by try­ing to repli­cate psy­chol­ogy stud­ies may re­port rit­ual p-val­ues and sub­mit for rit­ual peer-re­view-by-idiots. Similarly, some doc­tors in the past no doubt prayed while giv­ing their pa­tients an­tibiotics. It doesn’t mean that prayer works some of the time. It means that these heroes are do­ing sci­ence, and sep­a­rately, do­ing bu­reau­cracy and a kind of elab­o­rate rit­ual that is what our gen­er­a­tion con­sid­ers to be pres­ti­gious and mys­te­ri­ous witch doc­tery.

[1] https://​​ar­bital.com/​​p/​​like­li­hoods_not_pval­ues/​​?l=4x