Field-Building and Deep Models

What is important in hiring/​field-building in x-risk and AI alignment communities and orgs? I had a few conversations on this recently, and I’m trying to publicly write up key ideas more regularly.

I had in mind the mantra ‘better written quickly than not written at all’, so you can expect some failures in enjoyability and clarity. No character represents any individual, but is an amalgam of thoughts I’ve had and that others have raised.


albert cares deeply about x-risk from AI, and wants to grow the field of alignment quickly; he also worries that people in x-risk community errs too much on the side of hiring people similar to themselves.

ben cares deeply about x-risk from AI, and thinks that we should grow the AI safety community slowly and carefully; he feels it’s important to ensure new members of the community understand what’s already been learned, and avoid the eternal september effect.


albert: So, I understand you care about picking individuals and teams that agree with your framing of the problem.

ben: That sounds about right—a team or community must share deep models of their problems to make progress together.

albert: Concretely on the research side, what research seems valuable to you?

ben: If you’re asking what I think is most likely to push the needle forward on alignment, then I’d point to MIRI’s and Paul’s respective research paths, and also some of the safety work being done at DeepMind and FHI.

albert: Right. I think there are also valuable teams being funded by FLI and Open Phil who think about safety while doing more mainstream capabilities research. More generally, I think you don’t need to hire people that think very similarly to you in your organisations. Do you disagree?

ben: That’s an interesting question. On the non-research side, my first thought is to ask what Y Combinator says about organisations. One thing we learn from YC are that the first 10-20 hires of your organisation will make or break it, especially the co-founders. Picking even a slightly suboptimal co-founder—someone who doesn’t perfectly fit your team culture, understand the product, and work well with you—is the easiest way to kill your company. This suggests to me a high prior on selectivity (though I haven’t looked in detail into the other research groups you mention).

albert: So you’re saying that if the x-risk community is like a small company it’s important to have similar views, and if it’s like a large company it’s less important? Because it seems to me that we’re more like a large company. There are certainly over 20 of us.

ben: While ‘size of company’ is close, it’s not quite it. You can have small companies like restaurants or corner stores where this doesn’t matter. The key notion is one of inferential distance.

To borrow a line from Peter Thiel: startups are very close to being cults, except that where cults are very wrong about something important, startups are very right about something important.

As founders build detailed models of some new domain, they also build an inferential distance of 10+ steps between themselves and the rest of the world. They start to feel like everyone outside the startup is insane, until the point where the startup makes billions of dollars and then the update propagates throughout the world (“Oh, you can just get people to rent out their own houses as a BnB”).

A founder has to make literally thousands of decisions based off of their detailed models of the product/​insight, and so you can’t have cofounders who don’t share at least 90% of the deep models.

albert: But it seems many x-risk orgs could hire people who don’t share our basic beliefs about alignment and x-risk. Surely you don’t need an office manager, grant writer, or web designer to share your feelings about the existential fate of humanity?

ben: Actually, I’m not sure I agree with that. It again comes down to how much the org is doing new things versus doing things that are central cases of a pre-existing industry.

At the beginning of Open Phil’s existence they wouldn’t have been able to (say) employ a typical ‘hiring manager’ because the hiring process design required deep models of what Open Phil’s strategy was and what variables mattered. For example ‘how easily someone can tell you the strength and cause of their beliefs’ was important to Open Phil.

Similarly, I believe the teams at CFAR and MIRI have optimised workshops and research environments respectively, in ways that depend on the specifics of their particular workshops/​retreats and research environments. A web designer needs to know the organisation’s goals well enough to model the typical user and how they need to interact with the site. An operations manager needs to know what financial trade-offs to make; how important for the workshop is food versus travel versus ergonomics of the workspace. Having every team member understand the core vision is necessary for a successful organisation.

albert: I still think you’re overweighting these variables, but that’s an interesting argument. How exactly do you apply this hypothesis to research?

ben: It doesn’t apply trivially, but I’ll gesture at what I think: Our community has particular models, worldview and general culture that helped to notice AI in the first place, and has produced some pretty outstanding research (e.g. logical induction, functional decision theory); I think that the culture is a crucial thing to sustain, rather than to be cut away from the insights it’s produced so far. It’s important, for those working on furthering its insights and success, to deeply understand the worldview.

albert: I agree that having made progress on issues like logical induction is impressive and has a solid chance of being very useful for AGI design. And I have a better understanding of your position—sharing deep models of a problem is important. I just think that some other top thinkers will be able to make a lot of the key inferences themselves—look at Stuart Russell for example—and we can help that along by providing funding and infrastructure.

Maybe we agree on the strategy of providing great thinkers the space to think about and discuss these problems? For example, events where top AI researchers in academia are given the space to share models with researchers closer to our community.

ben: I think I endorse that strategy, or at least the low-fidelity one you describe. I expect we’d have further disagreements when digging down into the details, structure and framing of such events.

But I will say, when I’ve talked with alignment researchers at MIRI, something they want more than people working on agent foundations, or Paul’s agenda, are people who grok a bunch of the models and still have disagreements, and work on ideas from a new perspective. I hope your strategy helps discover people who deeply understand and have a novel approach to the alignment problem.


For proofreads on various versions of this post, my thanks to Roxanne Heston, Beth Barnes, Lawrence Chan, Claire Zabel and Raymond Arnold. For more extensive editing (aka telling me to cut a third of it), my thanks to Laura Vaughan. Naturally, this does not imply endorsement from any of them (most actually had substantial disagreements).