AI safety undervalues founders
TL;DR: In AI safety, we systematically undervalue founders and field‑builders relative to researchers and prolific writers. This status gradient pushes talented would‑be founders and amplifiers out of the ecosystem, slows the growth of research orgs and talent funnels, and bottlenecks our capacity to scale the AI safety field. We should deliberately raise the status of founders and field-builders and lower the friction for starting and scaling new AI safety orgs.
Epistemic status: A lot of hot takes with less substantiation than I’d like. Also, there is an obvious COI in that I am an AI safety org founder and field-builder.
Coauthored with ChatGPT.
Why boost AI safety founders?
Multiplier effects: Great founders and field-builders have multiplier effects on recruiting, training, and deploying talent to work on AI safety. At MATS, mentor applications are increasing 1.5x/year and scholar applications are increasing even faster, but deployed research talent is only increasing at 1.25x/year. If we want to 10-100x the AI safety field in the next 8 years, we need multiplicative capacity, not just marginal hires; training programs and founders are the primary constraints.
Anti-correlated attributes: “Founder‑mode” is somewhat anti‑natural to “AI concern.” The cognitive style most attuned to AI catastrophic risk (skeptical, risk‑averse, theory-focused) is not the same style that woos VCs, launches companies, and ships MVPs. If we want AI safety founders, we need to counterweight the selection against risk-tolerant cognitive styles to prevent talent drift and attract more founder-types to AI safety.
Adverse incentives: The dominant incentive gradients in AI safety point away from founder roles. Higher social status, higher compensation, and better office/advisor access often accrue to research roles, so the local optimum is “be a researcher,” not “found something.” Many successful AI safety founders work in research-heavy roles (e.g., Buck Shlegeris, Beth Barnes, Adam Gleave, Dan Hendrycks, Marius Hobbhahn, Owain Evans, Ben Garfinkel, Eliezer Yudkowsky) and the status ladder seems to reward technical prestige over building infrastructure. In mainstream tech, founders are much higher status than in AI safety, and e/accs vs. AI safers are arguably in competition for VC resources and public opinion.
Founder effects: AI safety (or at least security) seems on the verge of becoming mainstream and the AI safety ecosystem should capture resources or let worse alternatives flourish. Unlikely allies, including MAGA (e.g., Steve Bannon, Marjorie Taylor-Greene), the child-safety lobby, and Encode AI, recently banded together to defeat Ted Cruz’s proposed 10-year moratorium on state AI legislation. Opinion polls indicate AI safety is a growing public concern. Many VC-backed AI security startups have launched this year (e.g., AISLE, Theorem, Virtue AI, Lucid Computing, TamperSec, Ulyssean), including via YC. We have the chance to steer public interest and capital towards greater impact, but only if we can recruit and deploy founders fast enough.
How did we get here?
Academic roots: The founders of Effective Altruism and Rationalism, the movements that popularized AI safety, were largely academics and individual contributors working in tech, not founders and movement builders. Longtermist EA and Rationalist cultures generally reward epistemic rigor, moral scrupulosity, and “lone genius” technical contributions more than building companies, shipping products, and coordinating people. Rationalists valorize “wizard power”, like making original research contributions, over “king power”, like raising and marshaling armies of researchers to solve AI alignment.
Biased spotlights: AI safety ecosystem spotlights like 80,000 Hours selectively amplify researchers and academics over founders. When AI safety founders are featured on the 80,000 Hours Podcast, they are almost always in research-heavy roles. Significant AI safety field-building orgs (e.g., BlueDot, MATS, Constellation, LISA, PIBBSS, ARENA, ERA, Apart, Pivotal) or less-influential research orgs (e.g., Apollo, EleutherAI, Goodfire, Timaeus) are generally not given much attention. The 80,000 Hours career review on “Founder of new projects tackling top problems” feels like a stub. Open Philanthropy RFPs technically support funding for new organizations, but this feels overshadowed by the focus on individual contributors in their branding.
Growth-aversion: AI safety grantmakers have (sometimes deliberately) throttled the growth of nascent orgs. The vibe that “rapid org scaling is risky” makes founding feel counter‑cultural. Throttling orgs can be correct in specific cases, but it generally creates a disincentive towards building by reducing confidence in grantmaker support for ambitious projects. An influential memo from 2022 argued against “mass movement building” in AI safety on the grounds that it would dilute the quality of the field; subsequently, frontier AI companies grew 2-3x/year, apparently unconcerned by dilution. Training programs (e.g., BlueDot, MATS, ARENA) and incubators (e.g., Catalyze Impact, Seldon Lab, Constellation Incubator) arrived late relative to need; even now, they occupy relatively low status positions relative to research orgs they helped build.
Potential counter-arguments
We don’t have enough good ideas to deploy talent at scale, so founders/field-builders aren’t important. I disagree; I think there are many promising AI safety research agendas that can absorb talent for high impact returns (e.g., AI control, scalable oversight, AI governance, open-weight safety, mech interp, unlearning, cooperative AI, AIXI safety, etc.). Also, if ideas are the bottleneck, a “hits-based approach” seems ideal! We should be launching more AI safety ideas bounties and contests, agenda incubators like Refine and the PIBBSS x Iliad residency, and research programs like AE Studio’s “Neglected Approaches” initiative. Most smart people are outside the AI safety ecosystem, so outreach and scaling seem critical to spawning more AI safety agendas.
We should be careful not to dilute the quality of the field by scaling too fast. I confess that I don’t really understand this concern. If outreach funnels attract a large number of low-caliber talent to AI safety, we can enforce high standards for research grants and second-stage programs like ARENA and MATS. If forums like LessWrong or the EA Forum become overcrowded with low-calibre posts, we can adjust content moderation or the effect of karma on visibility. As a last resort, field growth could be scaled back via throttled grant funding. Additionally, growing the AI safety field is far from guaranteed to reduce the average quality of research, as most smart people are not working on AI safety and, until recently, AI safety had poor academic legibility. Even if growing the field reduces the average researcher quality, I expect this will result in more net impact.
Great founders don’t need help/coddling; they make things happen regardless. While many great founders succeed in the absence of incubators or generous starting capital, Y Combinator has produced some great startups! Adding further resources to aid founders seems unlikely to be negative value and will likely help potential founders who lack access to high-value spaces like Constellation, LISA, or FAR Labs, which are frequented by grantmakers and AI safety tastemakers. As an example, if not for Lightcone Infrastructure’s Icecone workshop in Dec 2021-Jan 2022, I would probably have found it hard to make the necessary connections and positive impressions to help MATS scale.
What should we do?
Narrative shift: Prominent podcasts like 80,000 Hours should publish more interviews with AI safety founders and field-builders. Someone should launch an “AI safety founders” podcast/newsletter that spotlights top founders and their journeys.
Career surfaces: Career advisors like 80,000 Hours and Probably Good should make “AI safety org founder” and “AI safety field-builder” a first‑class career path in guides and advising. Incubators like Halcyon Futures, Catalyze Impact, Seldon Lab, Constellation Incubator, etc. should be given prominence.
Capital surfaces: Funders like Open Philanthropy should launch RFPs explicitly targeted towards new org formation with examples of high-impact projects they want founded.
Social surfaces: AI safety hubs like Constellation, LISA, FAR Labs, and Mox should host events for aspiring founders. Field-building programs should launch founders networks to provide warm intros to mentors/advisors, grantmakers/VCs, and fiscal sponsorship orgs.
How to become a founder
Apply to an incubator that understands AI safety, like Halcyon Futures, Catalyze Impact, Seldon Lab, Constellation Incubator, Fifty Years 5050 AI, Entrepeneur First def/acc. YC has also funded several AI security and interpretability startups.
Draft a 1‑page pitch or theory of change and circulate it via Slack groups, forums, office happy hours, or friends.
Join the AI Safety Founders Network and ask for feedback on your idea.
Apply to RFPs that allow org funding, like Open Philanthropy’s “Funding for work that builds capacity to address risks from transformative AI”.
Talk to VCs/angels when there’s a plausible revenue path. Brainstorm for-profit AI alignment org ideas.
I want to register disagreement. Multiplier effects are difficult to get and easy to overestimate. It’s very difficult to get other people working on the right problem, rather than slipping off and working on an easier but ultimately useless problem. From my perspective, it looks like MATS fell into this exact trap. MATS has kicked out ~all the mentors who were focused on real problems (in technical alignment) and has a large stack of new mentors working on useless but easy problems.
Aren’t the central example of founders in AI Safety the people who founded Anthropic, OpenAI and arguably Deepmind? Right after that Mechanize comes to mind.
I am not fully sure what you mean by founders, but it seems to me that the best organizations were founded by people who also wrote a lot, and generally developed a good model of the problems in parallel to running an organization. Even this isn’t a great predictor. I don’t really know what is. It seems like generally working in the space is just super high variance.
To be clear, overall I do think many more people should found organization, but the arguments in this post seem really quite weak. The issue is really not that otherwise we “can’t scale the AI Safety field”. If anything it goes the other way around! If you just want to scale the AI safety field, go work at one of the existing big organizations like Anthropic, or Deepmind, or Far Labs or whatever. They can consume tons of talent, and you can probably work with them on capturing more talent (of course, I think the consequences of doing so for many of those orgs would be quite bad, but you don’t seem to think so).
Also, to expand some more on your coverage of counterarguments:
No, you can’t, because the large set of people you are trying to “filter out” will now take an adversarial stance towards you as they are not getting the resources they think they deserve from the field. This reduces the signal-to-noise ratio of almost all channels of talent evaluation, and in the worst case produces quite agentic groups of people actively trying to worsen the judgement of the field in order to gain entry.
I happen to have written a lot about this this week: Paranoia: A Beginner’s Guide for example has an explanation of lemons markets that applies straightforwardly to grant evaluations and program applications.
This is a thing that has happened all over the place, see for example the pressures on elite universities to drop admission standards and continue grade inflation by the many people that are now part of the university system, but wouldn’t have been in previous decades.
Summoning adversaries, especially ones that have built an identity around membership in your group should be done very carefully. See also Tell people as early as possible it’s not going to work out, which I also happen to have published this week.
Yes, and this was of course, quite bad for the world? I don’t know, maybe you are trying to model AI safety as some kind of race between AI Safety and the labs, but I think this largely fails to model the state of the field.
Like, again, man, do you really think the world would be at all different in terms of our progress on safety if everyone who works on whatever applied safety is supposedly so scalable had just never worked there? Kimi K2 is basically as aligned and as likely to be safe when scaled to superintelligence as whatever Anthropic is cooking up today. The most you can say is that safety researchers have been succeeding at producing evidence about the difficulty of alignment, but of course that progress has been enormously set back by all the safety researchers working at the frontier labs which the “scaling of the field” is just shoveling talent into, which has pressured huge numbers of people to drastically understate the difficulty and risks from AI.
I mean, and many of them don’t! CEA has not been lead by people with research experience for many years, and man I would give so much to have ended up in a world that went differently. IMO Open Phil’s community building has deeply suffered from a lack of situational awareness and strategic understanding of AI, and so massively dropped the ball. I think MATS’s biggest problem is roughly that approximately no one on the staff is a great researcher yourself, or even attempts to do any kind of the work you try to cultivate, which makes it much harder for you to steer the program.
Like, I am again all in favor of people starting more organizations, but man, we just need to understand that we don’t have the forces of the market on our side, and this means the premium we get for having people steer the organizations who have their own internal feedback loop and their own strategic map of the situation, which requires actively engaging with the core problems of the field, is much greater than it is in YC and the open market. The default outcome if you encourage young people to start an org in “AI Safety” is to just end up with someone making a bunch of vaguely safety-adjacent RL environments that get sold to big labs, that my guess is make things largely worse (I am not confident in this, but I am pretty confident it doesn’t make things much better).
And so what I am most excited about are people who do have good strategic takes starting organizations, and to demonstrate that they have that, and to develop the necessary skills, they need to write and publish publicly (or at least receive mentorship from someone who does have for a substantial period of time).
Thanks for reading and replying! I’ll be brief:
I consider the central examples of successful AI safety org founders to be Redwood, METR, Transluce, GovAI, Apollo, FAR AI, MIRI, LawZero, Pattern Labs, CAIS, Goodfire, Palisade, BlueDot, Constellation, MATS, Horizon, etc. Broader-focus orgs like 80,000 Hours, Lightcone, CEA and others have also had large impact. Apologies to all those I’ve missed!
I definitely think founders should workshop their ideas a lot, but this is not necessarily the same thing as publishing original research or writing on forums. Caveat: research org founders often should be leading research papers.
I don’t think that a great founder will have more impact in scaling the AI safety research field by working at “Anthropic, GDM, or FAR Labs” relative to founding a new research org or training program.
Maybe I’m naive about how easy it is to adjust standards for grantmakers or training programs. My experience with MATS, LISA, and Manifund has involved a lot of selection and the bar at MATS has raised every program for 4 years now, but I don’t feel a lot of pressure from rejected applicants to lower our standards. Maybe this will come with time? Or maybe it’s an ecosystem-wide effect? I see the pressure to increase elite university admissions pressure as unideal, but not a field-killer; plus, AI safety seems far from this point. I acknowledge that you have a lot of experience with LTFF and other selection processes.
I don’t think AI companies scaling 2-3x/year is good for the world. I do think AI safety talent failing to keep up is bad for the world. It’s not so much an adversarial dynamic as a race to lower the alignment tax as much as possible at every stage.
I don’t think that Anthropic’s safety work is zero value. I’d like to see more people working on ASL-4,5 safety at Anthropic and Kimi, all else equal. I’d also like to see more AI safety training programs supplying talent, nonprofits orgs scaling auditing and research, and advocacy orgs shifting public perception.
I’m not sure how to think about CEA (and I lack your information here), but my first reaction is not “CEA should have been led by researchers.” I also don’t think Open Phil is a good example of an org that lacked researchers; some of the best worldview investigations research imo came from Open Phil staff or affiliates, including Joe Carlsmith, Ajeya Cotra, Holden Karnofsky, Carl Schulman, etc.
I’m more optimistic than you about the impact of encouraging more AI safety founders. I’m particularly excited by Halcyon Future’s work in helping launch Goodfire, AIUC, Lucid Computing, Transluce, Seismic, AVERI, Fathom, etc. To date, I know of only two such RL dataset startups that spawned via AI safety (Mechanize, Calaveras) in contrast to ~150 AI safety-promoting orgs (though I’m sure there are other examples of AI safety-detracting startups).
I fully endorse more potential founders writing up pitches or theories of change for discussion on LW or founder networks! I think this can only strengthen their impact.
What?! Something terrible must be going on in your mechanisms for evaluating people (which to be clear, isn’t surprising, indeed, you are the central target of the optimization that is happening here, but like, to me it illustrates the risks here quite cleanly).
It is very very obvious to me that median MATS participant quality has gone down continuously for the last few cohorts. I thought this was somewhat clear to y’all and you thought it was worth the tradeoff of having bigger cohorts, but you thinking it has “gone up continuously” shows a huge disconnect.
Like, these days at the end of a MATS program half of the people couldn’t really tell you why AI might be an existential risk at all. Their eyes glaze over when you try to talk about AI strategy. IDK, maybe these people are better ML researchers, but obviously they are worse contributors to the field than the people in the early cohorts.
Yeah, I mean, I do think I am a lot more pessimistic about all of these. If you want we can make a bet on how well things have played out with these in 5 years, deferring to some small panel of trusted third party people.
Agree. Making RL environments/datasets has only very recently become a highly profitable thing, so you shouldn’t expect much! I am happy to make bets that we will see many more in the next 1-2 years.
The MATS acceptance rate was 33% in Summer 2022 (the first program with open applications) and decreased to 4.3% (in terms of first-stage applicants; ~7% if you only count those who completed all stages) in Summer 2025. Similarly, our mentor acceptance rate decreased from 100% in Summer 2022 to 27% for the upcoming Winter 2026 Program.
I don’t have plots prepared, but measures of scholar technical ability (e.g., mentor ratings, placements, CodeSignal score) have consistently increased. I feel very confident that MATS is consistently improving in our ability to find, train, and place ML (and other) researchers in AI safety roles, predominantly as “Iterators”. Also, while the fraction of the cohort that display strong “Connector” disposition seems to have decreased over time, I think that the raw number of strong Connectors has generally increased with program size due to our research diversity metric in mentor selection. I would argue that the phenomenon you are witnessing is an increasing pivot from more theoretical to empirical AI safety mentors and research agendas.
Based on my personal experience, I think the claim “half of MATS couldn’t tell you why AI might be an existential risk” is incorrect. I can’t speak to how MATS scholars have engaged with you on AI strategy, but I would bet that the average MATS scholar today spends a lot more time on ML experiments than reading AI safety strategy docs compared to three years ago. To be clear, I think this is a good thing! I respect your disagreement here. MATS has tried to run AI safety strategy workshops and reading groups many times in the past, but this has generally had low engagement relative to our seminar series (which features some prominent AI safety strategists anyways). If you have great ideas for how to better structure strategy workshops or generate interest, I would love to hear! (We are currently brainstorming this.)
I mean, in as much as one is worried about Goodhart’s law, and the issue in contention is adversarial selection, then the acceptance rate going down over time is kind of the premise of the conversation. Like, it would be evidence against my model of the situation if the acceptance rate had been going up (since that would imply MATS is facing less adversarial pressure over time).
Mentor ratings is the most interesting category to me. As you can imagine I don’t care much for ML skill at the margin. CodeSignal is a bit interesting though I am not familiar enough with it to interpret it, but I might look into it.
I don’t know whether you have any plots of mentor ratings over time broken out by individual mentor. My best guess is the reason why mentor ratings are going up is because you have more mentors who are looking for basically just ML skill, and you have successfully found a way to connect people into ML roles.
This is of course where most of your incentive gradient was pointing to in the first place, as of course the entities that are just trying to hire ML researchers have the most resources, and you will get the most applicants for highly paid industry ML roles, which are currently among the most prestigious and most highly paid roles in the world (while of course being centrally responsible for the risk from AI that we are working on).
This is not counter-evidence to the accusation that scholar quality has been going downhill unless you add in several other assumptions.
It’s not supposed to be counter-evidence in its own right. I like to present the full picture.
What do you think are ways to identify good strategic takes? This is something that seems rather fuzzy to me. It’s not clear how people are judging criteria like this or what they think is needed to improve on this.
I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.
The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.
Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I’d guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key effect flipped—it seems disturbingly common to have “paradoxical impact.”
I suspect the risk of paradoxical impact—even from your own work—is often substantial, especially in poorly understood domains. My favorite example of this is the career of Fritz Haber, who by discovering how to efficiently mass-produce fertilizer, explosives, and chemical weapons, seems plausibly to have both counterfactually killed and saved millions of lives.
But it’s even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result. So I do think it’s worth being fairly paranoid about this in high-stakes, poorly-understood domains, and perhaps especially so in AI safety, where numerous such skulls have already appeared.
It is hard to predict this, but I think we could have done better (and can do better in the future still).
That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn’t made the problem known, since I roughly can’t imagine worlds where we do survive where the problem wasn’t made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won’t. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.
(Maybe this isn’t what you mean exactly, and there are ways we could have made the problem known that seemed less like “freaking out”? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).