There should be more AI safety orgs

I’m writing this in my own capacity. The views expressed are my own, and should not be taken to represent the views of Apollo Research or any other program I’m involved with.

TL;DR: I argue why I think there should be more AI safety orgs. I’ll also provide some suggestions on how that could be achieved. The core argument is that there is a lot of unused talent and I don’t think existing orgs scale fast enough to absorb it. Thus, more orgs are needed. This post can also serve as a call to action for funders, founders, and researchers to coordinate to start new orgs.

This piece is certainly biased! I recently started an AI safety org and therefore obviously believe that there is/​was a gap to be filled.

If you think I’m missing relevant information about the ecosystem or disagree with my reasoning, please let me know. I genuinely want to understand why the ecosystem acts as it does right now and whether there are good reasons for it that I have missed so far.

Why?

Before making the case, let me point out that under most normal circumstances, it is probably not reasonable to start a new organization. It’s much smarter to join an existing organization, get mentorship, and grow the organization from within. Furthermore, building organizations is hard and comes with a lot of risks, e.g. due to a lack of funding or because there isn’t enough talent to join early on.

My core argument is that we’re very much NOT under normal circumstances and that, conditional on the current landscape and the problem we’re facing, we need more AI safety orgs. By that, I primarily mean orgs that can provide full-time employment to contribute to AI safety but I’d also be happy if there were more upskilling programs like SERI MATS, ARENA, MLAB & co.

Talent vs. capacity

Frankly, the level of talent applying to AI safety organizations and getting rejected is too high. We have recently started a hiring round and we estimate that a lot more candidates meet a reasonable bar than we could hire. I don’t want to go into the exact details since the round isn’t closed but from the current applications alone, you could probably start a handful of new orgs.

Many of these people could join top software companies like Google, Meta, etc. or even already are at these companies and looking to transition into AI safety. Apollo is a new organization without a long track record, so I expect the applications for other alignment organizations to be even stronger.

The talent supply is so high that a lot of great people even have to be rejected from SERI MATS, ARENA, MLAB, and other skill-building programs that are supposed to get more people into the field in the first place. Also, if I look at the people who get rejected from existing orgs like Anthropic, OpenAI, DM, Redwood, etc. it really pains me to think that they can’t contribute in a sustainable full-time capacity. This seems like a huge waste of talent and I think it is really unhealthy for the ecosystem, especially given the magnitude and urgency of AI safety.

Some people point to independent research as an alternative. I think independent research is a temporary solution for a small subset of people. It’s not very sustainable and has a huge selection bias. Almost anyone with a family or with existing work experience is not willing to take the risk. In my experience, women also have a disproportional preference against independent research compared to men, so the gender balance gets even worse than it already is (this is only anecdotal evidence, I have not looked at this in detail).

Furthermore, many people just strongly prefer working with others in a less uncertain, more regular environment of an organization, even if that organization is fairly new. It’s just more productive and more fun to work with a team than as an independent.

Additionally, getting funding for independent research right now is also quite hard, e.g. the LTFF is currently quite resource-constrained and has a high bar (not their fault!; update: may be less funding constrained now). All in all, independent research just really seems like a band-aid to a much bigger problem.

Lastly, we’re strongly undercounting talent that is not already in our bubble. There are a lot of people concerned about existential risks from AI that are already working at existing tech companies. They would be willing to “take the jump” and join AI safety orgs but they wouldn’t take the risk to do independent research. These people typically have a solid ML background and could easily contribute if they skilled up in alignment a bit.

In almost all normal industries, organizations are aware that their hires don’t really contribute for the first couple of months and see this education as an upfront investment. In AI safety, we currently have the luxury that we can hire people who can contribute from day one. While this is very comfortable for the organizations, it really is a bad sign for the ecosystem as a whole.

The opportunity costs are minimal

Even if more than half of new AI safety orgs failed, it seems plausible that the opportunity costs of funding them are minimal. A lot of the people who would be enabled by more organizations would just be unable to contribute in the counterfactual world. In fact, more people might join capability teams at existing tech companies for lack of alternatives.

In the cases where a new org would fail, their employees could likely try to join another AI safety org that is still going strong. That seems totally fine, they probably learned valuable skills in the meantime. This just seems like normal start-ups and companies operate and should not discourage us from trying.

I have heard an argument along the lines of “new orgs might lock up great talent that then can’t contribute at the frontier safety labs” which would imply some opportunity costs. This doesn’t seem like a strong argument to me. The people in the new orgs are free to move to big orgs and Anthropic/​DM/​OpenAI has a lot more money, status, compute, and mentorship available. If they want to snatch someone, they probably could. This feels like something that the people themselves should decide—right now they just don’t have that option in the first place.

We are wasting valuable time

Warning: This section is fairly speculative.

I know that some people have longer timelines than I do but even under more conservative timelines, it seems reasonable that there should be more people working on AI safety right now.

If we think that “solving” alignment is about as hard as the Manhatten project (130k people) or the Apollo project (400k people) then we need to scale the size of the community by about 3 orders of magnitude, assuming there are currently 100-500 full-time employees working on alignment. Let’s assume, for simplicity, Ajeya Cotra’s latest public median estimate for TAI of ~2040. This would imply about 3 OOMs in ~15 years or 1 OOM every 5 years. If you think it’s harder than the two named projects maybe 4 OOMs may be more accurate.

There are a couple of additional changes I’d personally like to make to this trajectory. First, I’d much rather eat up the first 2 OOMs early (maybe in the first 3-5 years) so we can build relevant expertise and engage in research agendas with longer payoff times. Second, I just don’t think the timelines are accurate and would rather calculate with 2033 or earlier. Under these assumptions, 10x every 2 years seems more appropriate.

A 10x growth every ~4 years could be done with a handful of existing orgs but a 10x growth every ~2 years probably requires more orgs. Since my personal timelines are much closer to the latter, I’m advocating for more organizations. Even on the slower end of the spectrum, we should probably not bank on the fact that the existing orgs are able to scale at the required pace and diversify our bets.

I sometimes hear a view that doing research now is almost irrelevant and we should keep a big war chest for “the end game”. I understand the intuition but it just feels wrong to me. Lots of research is relevant today and we do already get relevant feedback from empirical work on current AI systems. Evals and interpretability are obvious examples but research into scalable oversight or adversarial training can be done today as well and seems relevant for the future. Furthermore, if we wanted to spend >50% of the budget in the last year (assuming we knew when that was), we still need people to spend that money on. Building up the research and engineering capacity today seems already justified from a skill- and tool-building perspective alone.

The funding exists (maybe?)

I find it hard to get a good sense of the funding landscape right now (see e.g. this post), for example, I currently don’t have a good estimate of how much money OpenPhil has available and how much of that is earmarked for AI safety. Thus, I won’t speculate too much on the long-year funding plan of existing AI safety funders.

However, historically funding for AI safety has looked ~like this (copied from this post):

This indicates that OpenPhil, SFF, and others allocate high double-digit millions into AI safety every year and the trend is probably rising. My best guess is, therefore, that funders would be willing to support new orgs with reasonably-sized seed grants if they met their bar. I’m not very sure where that bar currently is, but I personally think it totally makes sense to fully fund a fairly junior group of researchers for a year who want to give it a go as long as they have a somewhat reasonable plan (as stated above, the opportunity costs just aren’t that high). Funders like OpenPhil might be hesitant to fund a new org but they may be more willing to fund a preliminary working group or collective that could then transition into an org if needed.

If funders agree with me here, I think it would be great if they signaled this willingness, e.g. by having a “starter package” for an org where a group of up to 5 people gets $100-300k (includes salary, compute, food, office, equipment, etc.) per person to do research for ~a year (where the exact amount depends on experience and promisingness). To give concrete examples, Jesse’s work on SLT and Kaarel’s work on DLK (or to be more precise, the agendas that followed from those works; the actual agendas are probably many months ahead of the public writing by now) are promising enough that I would totally give them $500k+ for a year to develop it further if I were in a position to move these amounts of money. If the project doesn’t work out, they could still close the org/​working group and move on.

Another option is to look for grants from other sources, e.g. non-EA funders or VCs.

AI safety is becoming more and more mainstream and philanthropic and private high-networth individuals are becoming interested in allocating money to the space. Furthermore, there are likely going to be government grants available for AI safety in the near future. These funding schemes are often opaque and the probability of success is lower than with traditional EA funders but it is now at least possible at all.

Another source of funding that I was not really considering until recently is VC funding. There are a couple of VCs who are interested in AI safety for the right reasons and it seems to me that there will be a large market around AI safety in some form. People just want to understand what’s going on in their models and what their limitations are, so there surely is a way to create products and services to satisfy these needs. It’s very important though to check if your strategy is compatible with the VCs’ vision and to be honest about your goals. Otherwise, you’ll surely end up in conflict and won’t be able to achieve the goal of reducing catastrophic risks from AI.

VC backing obviously reduces your option space because you need to eventually make a product. On the other hand, there are many more VCs than there are donors, so it may be worth the trade-off (also getting VC backing doesn’t exclude getting donations, it just makes them less likely).

How big is the talent gap?

I don’t know exactly how big the gap between available spots and available talent is but my best guess is ~2-20x depending on where you set the bar.

I don’t have any solid methodology here and I’d love for someone to do this estimate properly but my personal intuitions come from the following:

  1. Seeing our own hiring process and how many applications we have received so far where I think “This person could have a meaningful contribution within the first 3 months”.

  2. Seeing how many people were rejected from SERI MATS despite them probably being able to contribute.

  3. Seeing the number of people getting rejected from MLAB and REMIX despite me thinking that they “meet the bar”.

  4. Seeing the number of people getting rejected from Anthropic, DeepMind, Redwood, ARC, OpenAI, etc. despite me personally thinking that they should clearly be hired by someone.

  5. Seeing the number of people struggling to find full-time work after SERI MATS who I would intuitively judge as “hirable”.

  6. Seeing the number of highly qualified candidates who want to “take the jump” from a different ML job to AI safety but can’t due to lack of capacity.

  7. Talking to people who are currently hiring about the level of talent they have to reject due to lack of funding or mentorship capacity.

I can’t put any direct numbers on that because it would reveal information I’m not comfortable sharing in public but I can say that my intuitive aggregate conclusion from this results in a 2-20x talent-to-capacity gap. The 2x roughly corresponds to “could meaningfully contribute within a month of employment” and the 20x roughly corresponds to “could meaningfully contribute within half a year or less if provided with decent mentorship”.

How did we end up here?

My current feeling is that AI safety is underfunded and desperately needs more capacity to allow more talent to contribute. I’m not sure how we ended up here but I could imagine the following reasons to play a role.

  1. Reduction in available funding, e.g. due to the FTX crash and a general economic downturn.

  2. A lack of good funding opportunities, e.g. there just weren’t that many organizations that could effectively use a lot of money and were clearly trying to reduce catastrophic risks from AI.

  3. A lack of people being able to start new orgs: Starting and running an organization requires a different skillset than research. People with both the entrepreneurial skills to build an org and the technical talent to steer a good agenda may just be fairly rare.

  4. A conservative funding mindset from past funding choices: OpenPhil and others have tried to fund AI safety research for years and some of the bets just didn’t turn out as hoped. For example, OpenAI got a large starting grant from OpenPhil but then became one of the biggest AGI contributors. Redwood Research wanted to scale somewhat quickly but recently decided to scale down again.

Potentially this has led to a “stand-off” situation where funders are waiting for good opportunities and people who could start something see the change of the funding situation and are more hesitant to start something and as a result nothing happens.

I think a couple of new organizations and programs from the last 2 years, e.g. Redwood, FAR, CAIS, Epoch, SERI MATS, ARENA, etc., look like promising bets so far but I’d like to see many more orgs of the same caliber in the coming years.

Redwood has recently been criticized but I think their story to date is a mostly successful experiment for the AI safety community. In the worst-case interpretation, Redwood produced a lot of really talented AI safety researchers, in the best case, their scientific outputs were also quite valuable. I personally think, for example, that causal scrubbing is an important step for interpretability. Furthermore, I think trying something like MLAB and REMIX was very valuable both for the actual content value as well as the information value. So Redwood is mostly a win for AI safety funding in my books and should not be a reason for more conservative funding strategies just because not everything worked out as intended.

Some common counterarguments

“We don’t need more orgs, we need more great agendas”

There is a common criticism that the lack of orgs is due to the lack of agendas and if there were more great agendas, there would be more orgs. This criticism is often coupled with pointing out the lack of research leads who could execute such an agenda. While I think this is technically true, it doesn’t seem quite right.

The obvious question is how good agendas are developed in the first place. It may be through mentorship at another org or years of experience in academia/​independent research. Typically, the people who have this experience are hired by the big labs and therefore rarely start a new org. So if you think that only people who already have a great agenda should start an org, not starting new orgs is probably reasonable under current conditions. However, the assumption that you can only get good at research leadership through this path just seems wrong to me.

First, research leadership is obviously hard but you can learn it. I know lots of people who don’t have research leadership experience yet but who I would judge to be competent at running an agenda that could serve 3-10 people. Surely they would make many mistakes early on but they would grow into the position. Not trying seems like a strictly worse proposal than trying and failing (see section on opportunity costs).

Second, a great agenda just doesn’t seem like a necessary requirement. It seems totally fine for me to replicate other people’s work, extend existing agendas, or ask other orgs if they have projects to outsource (usually they do) for a year or so and build skills during that time. After a while, people naturally develop their own new ideas and then start developing their own agendas.

“The bottleneck is mentorship”

It’s clearly true that mentorship is a huge bottleneck but great mentors don’t fall from the sky. All senior people started junior and, in my personal opinion, lots of fairly junior people in the AI safety space would already make pretty good mentors early on, e.g. because they have prior management experience outside of AI safety or because they just have good research intuitions and ideas from the start.

Furthermore, a lot of the people who are in the position to provide mentorship are in full-time positions at big labs where they get access to compute, great team members, a big salary, etc. From that position, it just doesn’t seem that exciting to mentor lots of new people even if it had more impact in some cases. Therefore, a lot of very capable mentors are locked in positions where they don’t provide a lot of mentorship to newcomers because existing AI safety labs aren’t scaling fast enough to soak up the incoming talent streams. Thus, I personally think creating more orgs and thus places where mentorship can be learned and provided would be a good response.

Similar to the point about great agendas, it seems fine to me to just let people try, especially if the alternative is independent research or not contributing at all.

“The bottleneck is operations”

Before starting Apollo Research, I was really worried that operations would be a bottleneck. Now I’m not worried anymore. There are a lot of good operations people looking to get into AI safety and there is a lot of external help available.

Historically, EA orgs have sometimes struggled with finding operations talent but I think that was largely due to these orgs not providing an interesting value proposition to the talent they were looking for. For example, ops was sometimes framed as “everything that nobody else wanted to do” (see this post for details). So if you make a reasonable proposition, operations talent will come.

Furthermore, there is a lot of external help for operations in the EA sphere. Some organizations provide fiscal sponsorship and some operations help, e.g. Rethink Priorities or Effective Ventures. Impact ops (new org) may also be able to provide help early on and significantly help with operations.

“The downside risks are too high”

Another counterargument has been that there are large downside risks in funding AI safety orgs, e.g. because they may pull a switcheroo and cause further acceleration. This seems true to me and warrants increased caution but there seem to be lots of organizations that don’t have a high risk of falling into that category. Right now, training models and making progress on them is so expensive that any org with less than ~$50M of funding probably can’t train models close to the frontier anyway.

I personally think that many agendas have some capabilities externalities, e.g. building LM agents for evals or finding new insights for interpretability, but there are ways to address this. Orgs should carefully estimate the safety-capabilities balance of their outputs, including consulting with trusted external parties and then use responsible disclosure and differential publishing to circulate their outputs.

It feels like this is solvable by funding organizations where the leadership has a track record of caring about AI safety for the right reasons and has an agenda that isn’t too entangled with capabilities. It’s a hard problem but we should be able to make reasonable positive EV bets to address it.

“The best talent get jobs, the rest doesn’t matter”

I’ve heard the notion that AI safety talent is power law distributed, and therefore most impact will come from a small number of people anyways. This argument implies that it is fine to keep the number of AI safety researchers as low as it currently is (or scale very slowly) as long as the few relevant ones can contribute. I think there are a couple of problems with this argument.

  1. Even if the true number of those who can meaningfully contribute is low, the 100-500 we currently have is probably still too low given the size of the problem we’re dealing with.

  2. It’s hard to predict who the impactful people are going to be. Thus, spreading bets wider is better. Once there is more evidence about the quality of research, the best AI safety contributors can still coordinate to work in the same organization if they want to.

  3. The most impactful people can be enabled and accelerated by having a team around them. So more people are still better (as long as they meet a certain bar).

  4. Powerlaws stay powerlaws when you zoom out. Why is the right cutoff at the 100-500 people we currently have and not at 10k or 1M? You could use this argument to justify any arbitrary cutoff.

How?

I can’t provide a perfect recipe that will work for everyone but here are some scattered thoughts and suggestions about starting an org.

  1. Lots of people are “lurking”, i.e. they are not willing to start an org themselves but if someone else did it, they would be on board very quickly. The main bottleneck seems to be bringing these people together. It may be sufficient if one person says “I’m gonna commit” and takes charge of getting the org off the ground. If you have an understanding of AI safety and an entrepreneurial mindset, your talents are very much needed.

  2. It’s been done before. There are millions of companies, there are lots of books on how to build and run them, there are lots of people with good advice who’re happy to provide it, and the density of talent in AI safety makes it fairly easy to find good starting members.

  3. You don’t have to go all in right away. It’s fine (probably even recommended) to start as a group of 2-5 for half a year with independent funding and see whether you work well together and find a good agenda. If it works out, great, you can scale and make it more permanent. If it doesn’t, just disband, no hard feelings, you learned a lot on the way. I personally think SERI MATS & co are great environments to start such a test run because there are lots of good people around and you don’t have to make any long-term commitments.

  4. Agenda: You don’t need to have a great agenda right away, you can replicate existing agendas and start extending them when you understand the main bits. There is a long list of low-hanging fruit in empirical alignment that you can just jump on. Also, most senior people are happy to provide suggestions if you ask nicely. Furthermore, some people just have a reasonable agenda fairly quickly. For example, I think Jesse’s work on SLT and Kareel’s work on DLK have a lot of potential (or to be more precise, the agendas that followed from those works; the actual agendas are probably many months ahead of the public writing by now) despite both of them being fairly junior in traditional terms. I would generally recommend more people to just try and work on an agenda you’re excited about. In the best case, you find something important, in the worst case you get a lot of valuable experience.

  5. Operations: In case you’re a handful of people just trying out stuff for 6 months, operations is not that important. In case you’re more committed, good operations staff make a huge difference. I recommend asking around (the EA operations network is very well connected) and running a public hiring round (lots of good operations staff don’t typically hang out with the researchers).

  6. Funding: There are about 5-10 EA funders you can apply to. There are loads of non-EA funders you can apply to (but the application processes are very different). Furthermore, if your agenda can lead to a concrete product that is compatible with reducing catastrophic risks from AI, e.g. interpretability software, you can also think about a VC-backed route.

  7. Management: Excellent management is hard but it’s doable. In my (very limited) experience, it’s mostly a question of whether the organization makes it a priority to build up internal management capacity or not. You also don’t have to reinvent the wheel, there are lots of good books on management and the basic tips are a pretty reasonable start. From there, you can still experiment and iterate a lot and then you’ll get better over time.

  8. Learn from others: Others have already gone through the process of starting an org or program. Typically they are very willing to share ideas and I personally learned a lot from talking to people at EAGs and other occasions. We’ve also developed a couple of fairly general resources internally, e.g. about culture, hiring, management, the research process, etc. I’m happy to share them with externals under some conditions. Also, others would do many things differently the second time, you can learn a lot from their mistakes and try to prevent them.

  9. Fiscal sponsorship: Some organizations like Rethink Priorities, Charity Entrepreneurship, Effective Ventres, and others sometimes provide fiscal sponsorship, i.e. they host you as an organization, help with operations, manage your funding, etc. This can be very helpful, especially if you’re just a small group.

What now?

I don’t think everyone should try to start an org but I think there are some ways in which we, as a community, could make it easier for those who want to

  1. Mentally add it as one of the possible options for your career. You don’t have to do it but under specific circumstances, it may be the best choice.

  2. Create and support situations in which orgs could be founded, e.g. Lightcone, SERI MATS or the London AIS hub. These are great environments to try it out with low commitment while getting a lot of support from others.

  3. Make a “standard playbook” for founding an AIS org. It doesn’t have to be spectacular, just a short list of steps you should consider seems pretty helpful already (I may write one myself later this year in case someone is interested).

  4. Provide a “starter package”, e.g. let’s say $250-500k for 5 people for 6 months. If OpenPhil or SFF said that there is a fast process to get a starter package, I’m pretty sure more people would try to start a new research group/​org.

  5. If you’re part of an existing org, consider starting your own. I think in nearly all circumstances the answer is “stay where you are” but in some instances, it might be better to leave and start a new one. For example, if you think you have an agenda that could serve many more people but your current org can’t put you in a position to develop it, starting a new org may be the correct move.

Starting an org isn’t easy and lots of efforts will fail. However, given the lack of existing full-time AI safety capacity, it seems like we should try creating more orgs nonetheless. In the best case, a bunch of them will succeed, in the worst case, a “failed org” provides a lot of upskilling opportunities and leadership experience for the people involved.

I think the high quality of rejected candidates in AI safety is a very bad sign for the health of the community at the moment. The fact that lots of people with years of ML research and engineering experience with a solid understanding of alignment aren’t picked up is just a huge waste of talent. As an intuitive benchmark, I would like to get to a world where at least half of all SERI MATS scholars are immediately hired after the program and we aren’t even close to that yet.

Crossposted to EA Forum (117 points, 20 comments)