Field-Building and Deep Models

What is im­por­tant in hiring/​field-build­ing in x-risk and AI al­ign­ment com­mu­ni­ties and orgs? I had a few con­ver­sa­tions on this re­cently, and I’m try­ing to pub­li­cly write up key ideas more reg­u­larly.

I had in mind the mantra ‘bet­ter writ­ten quickly than not writ­ten at all’, so you can ex­pect some failures in en­joy­a­bil­ity and clar­ity. No char­ac­ter rep­re­sents any in­di­vi­d­ual, but is an amalgam of thoughts I’ve had and that oth­ers have raised.


albert cares deeply about x-risk from AI, and wants to grow the field of al­ign­ment quickly; he also wor­ries that peo­ple in x-risk com­mu­nity errs too much on the side of hiring peo­ple similar to them­selves.

ben cares deeply about x-risk from AI, and thinks that we should grow the AI safety com­mu­nity slowly and care­fully; he feels it’s im­por­tant to en­sure new mem­bers of the com­mu­nity un­der­stand what’s already been learned, and avoid the eter­nal septem­ber effect.


albert: So, I un­der­stand you care about pick­ing in­di­vi­d­u­als and teams that agree with your fram­ing of the prob­lem.

ben: That sounds about right—a team or com­mu­nity must share deep mod­els of their prob­lems to make progress to­gether.

albert: Con­cretely on the re­search side, what re­search seems valuable to you?

ben: If you’re ask­ing what I think is most likely to push the nee­dle for­ward on al­ign­ment, then I’d point to MIRI’s and Paul’s re­spec­tive re­search paths, and also some of the safety work be­ing done at Deep­Mind and FHI.

albert: Right. I think there are also valuable teams be­ing funded by FLI and Open Phil who think about safety while do­ing more main­stream ca­pa­bil­ities re­search. More gen­er­ally, I think you don’t need to hire peo­ple that think very similarly to you in your or­gani­sa­tions. Do you dis­agree?

ben: That’s an in­ter­est­ing ques­tion. On the non-re­search side, my first thought is to ask what Y Com­bi­na­tor says about or­gani­sa­tions. One thing we learn from YC are that the first 10-20 hires of your or­gani­sa­tion will make or break it, es­pe­cially the co-founders. Pick­ing even a slightly sub­op­ti­mal co-founder—some­one who doesn’t perfectly fit your team cul­ture, un­der­stand the product, and work well with you—is the eas­iest way to kill your com­pany. This sug­gests to me a high prior on se­lec­tivity (though I haven’t looked in de­tail into the other re­search groups you men­tion).

albert: So you’re say­ing that if the x-risk com­mu­nity is like a small com­pany it’s im­por­tant to have similar views, and if it’s like a large com­pany it’s less im­por­tant? Be­cause it seems to me that we’re more like a large com­pany. There are cer­tainly over 20 of us.

ben: While ‘size of com­pany’ is close, it’s not quite it. You can have small com­pa­nies like restau­rants or cor­ner stores where this doesn’t mat­ter. The key no­tion is one of in­fer­en­tial dis­tance.

To bor­row a line from Peter Thiel: star­tups are very close to be­ing cults, ex­cept that where cults are very wrong about some­thing im­por­tant, star­tups are very right about some­thing im­por­tant.

As founders build de­tailed mod­els of some new do­main, they also build an in­fer­en­tial dis­tance of 10+ steps be­tween them­selves and the rest of the world. They start to feel like ev­ery­one out­side the startup is in­sane, un­til the point where the startup makes billions of dol­lars and then the up­date prop­a­gates through­out the world (“Oh, you can just get peo­ple to rent out their own houses as a BnB”).

A founder has to make liter­ally thou­sands of de­ci­sions based off of their de­tailed mod­els of the product/​in­sight, and so you can’t have cofounders who don’t share at least 90% of the deep mod­els.

albert: But it seems many x-risk orgs could hire peo­ple who don’t share our ba­sic be­liefs about al­ign­ment and x-risk. Surely you don’t need an office man­ager, grant writer, or web de­signer to share your feel­ings about the ex­is­ten­tial fate of hu­man­ity?

ben: Ac­tu­ally, I’m not sure I agree with that. It again comes down to how much the org is do­ing new things ver­sus do­ing things that are cen­tral cases of a pre-ex­ist­ing in­dus­try.

At the be­gin­ning of Open Phil’s ex­is­tence they wouldn’t have been able to (say) em­ploy a typ­i­cal ‘hiring man­ager’ be­cause the hiring pro­cess de­sign re­quired deep mod­els of what Open Phil’s strat­egy was and what vari­ables mat­tered. For ex­am­ple ‘how eas­ily some­one can tell you the strength and cause of their be­liefs’ was im­por­tant to Open Phil.

Similarly, I be­lieve the teams at CFAR and MIRI have op­ti­mised work­shops and re­search en­vi­ron­ments re­spec­tively, in ways that de­pend on the speci­fics of their par­tic­u­lar work­shops/​re­treats and re­search en­vi­ron­ments. A web de­signer needs to know the or­gani­sa­tion’s goals well enough to model the typ­i­cal user and how they need to in­ter­act with the site. An op­er­a­tions man­ager needs to know what fi­nan­cial trade-offs to make; how im­por­tant for the work­shop is food ver­sus travel ver­sus er­gonomics of the workspace. Hav­ing ev­ery team mem­ber un­der­stand the core vi­sion is nec­es­sary for a suc­cess­ful or­gani­sa­tion.

albert: I still think you’re over­weight­ing these vari­ables, but that’s an in­ter­est­ing ar­gu­ment. How ex­actly do you ap­ply this hy­poth­e­sis to re­search?

ben: It doesn’t ap­ply triv­ially, but I’ll ges­ture at what I think: Our com­mu­nity has par­tic­u­lar mod­els, wor­ld­view and gen­eral cul­ture that helped to no­tice AI in the first place, and has pro­duced some pretty out­stand­ing re­search (e.g. log­i­cal in­duc­tion, func­tional de­ci­sion the­ory); I think that the cul­ture is a cru­cial thing to sus­tain, rather than to be cut away from the in­sights it’s pro­duced so far. It’s im­por­tant, for those work­ing on fur­ther­ing its in­sights and suc­cess, to deeply un­der­stand the wor­ld­view.

albert: I agree that hav­ing made progress on is­sues like log­i­cal in­duc­tion is im­pres­sive and has a solid chance of be­ing very use­ful for AGI de­sign. And I have a bet­ter un­der­stand­ing of your po­si­tion—shar­ing deep mod­els of a prob­lem is im­por­tant. I just think that some other top thinkers will be able to make a lot of the key in­fer­ences them­selves—look at Stu­art Rus­sell for ex­am­ple—and we can help that along by pro­vid­ing fund­ing and in­fras­truc­ture.

Maybe we agree on the strat­egy of pro­vid­ing great thinkers the space to think about and dis­cuss these prob­lems? For ex­am­ple, events where top AI re­searchers in academia are given the space to share mod­els with re­searchers closer to our com­mu­nity.

ben: I think I en­dorse that strat­egy, or at least the low-fidelity one you de­scribe. I ex­pect we’d have fur­ther dis­agree­ments when dig­ging down into the de­tails, struc­ture and fram­ing of such events.

But I will say, when I’ve talked with al­ign­ment re­searchers at MIRI, some­thing they want more than peo­ple work­ing on agent foun­da­tions, or Paul’s agenda, are peo­ple who grok a bunch of the mod­els and still have dis­agree­ments, and work on ideas from a new per­spec­tive. I hope your strat­egy helps dis­cover peo­ple who deeply un­der­stand and have a novel ap­proach to the al­ign­ment prob­lem.


For proofreads on var­i­ous ver­sions of this post, my thanks to Rox­anne He­ston, Beth Barnes, Lawrence Chan, Claire Za­bel and Ray­mond Arnold. For more ex­ten­sive edit­ing (aka tel­ling me to cut a third of it), my thanks to Laura Vaughan. Nat­u­rally, this does not im­ply en­dorse­ment from any of them (most ac­tu­ally had sub­stan­tial dis­agree­ments).