AI safety undervalues founders
TL;DR: In AI safety, we systematically undervalue founders and field‑builders relative to researchers and prolific writers. This status gradient pushes talented would‑be founders and amplifiers out of the ecosystem, slows the growth of research orgs and talent funnels, and bottlenecks our capacity to scale the AI safety field. We should deliberately raise the status of founders and field-builders and lower the friction for starting and scaling new AI safety orgs.
Epistemic status: A lot of hot takes with less substantiation than I’d like. Also, there is an obvious COI in that I am an AI safety org founder and field-builder.
Coauthored with ChatGPT.
Why boost AI safety founders?
Multiplier effects: Great founders and field-builders have multiplier effects on recruiting, training, and deploying talent to work on AI safety. At MATS, mentor applications are increasing 1.5x/year and scholar applications are increasing even faster, but deployed research talent is only increasing at 1.25x/year. If we want to 10-100x the AI safety field in the next 8 years, we need multiplicative capacity, not just marginal hires; training programs and founders are the primary constraints.
Anti-correlated attributes: “Founder‑mode” is somewhat anti‑natural to “AI concern.” The cognitive style most attuned to AI catastrophic risk (skeptical, risk‑averse, theory-focused) is not the same style that woos VCs, launches companies, and ships MVPs. If we want AI safety founders, we need to counterweight the selection against risk-tolerant cognitive styles to prevent talent drift and attract more founder-types to AI safety.
Adverse incentives: The dominant incentive gradients in AI safety point away from founder roles. Higher social status, higher compensation, and better office/advisor access often accrue to research roles, so the local optimum is “be a researcher,” not “found something.” Many successful AI safety founders work in research-heavy roles (e.g., Buck Shlegeris, Beth Barnes, Adam Gleave, Dan Hendrycks, Marius Hobbhahn, Owain Evans, Ben Garfinkel, Eliezer Yudkowsky) and the status ladder seems to reward technical prestige over building infrastructure. In mainstream tech, founders are much higher status than in AI safety, and e/accs vs. AI safers are arguably in competition for VC resources and public opinion.
Founder effects: AI safety (or at least security) seems on the verge of becoming mainstream and the AI safety ecosystem should capture resources or let worse alternatives flourish. Unlikely allies, including MAGA (e.g., Steve Bannon, Marjorie Taylor-Greene), the child-safety lobby, and Encode AI, recently banded together to defeat Ted Cruz’s proposed 10-year moratorium on state AI legislation. Opinion polls indicate AI safety is a growing public concern. Many VC-backed AI security startups have launched this year (e.g., AISLE, Theorem, Virtue AI, Lucid Computing, TamperSec, Ulyssean), including via YC. We have the chance to steer public interest and capital towards greater impact, but only if we can recruit and deploy founders fast enough.
How did we get here?
Academic roots: The founders of Effective Altruism and Rationalism, the movements that popularized AI safety, were largely academics and individual contributors working in tech, not founders and movement builders. Longtermist EA and Rationalist cultures generally reward epistemic rigor, moral scrupulosity, and “lone genius” technical contributions more than building companies, shipping products, and coordinating people. Rationalists valorize “wizard power”, like making original research contributions, over “king power”, like raising and marshaling armies of researchers to solve AI alignment.
Biased spotlights: AI safety ecosystem spotlights like 80,000 Hours selectively amplify researchers and academics over founders. When AI safety founders are featured on the 80,000 Hours Podcast, they are almost always in research-heavy roles. Significant AI safety field-building orgs (e.g., BlueDot, MATS, Constellation, LISA, PIBBSS, ARENA, ERA, Apart, LASR, Pivotal) or less-influential research orgs (e.g., Apollo, EleutherAI, FAR.AI, Goodfire, Palisade, Timaeus) are generally not given much attention. The 80,000 Hours career review on “Founder of new projects tackling top problems” feels like a stub. Open Philanthropy RFPs technically support funding for new organizations, but this feels overshadowed by the focus on individual contributors in their branding.
Growth-aversion: AI safety grantmakers have (sometimes deliberately) throttled the growth of nascent orgs. The vibe that “rapid org scaling is risky” makes founding feel counter‑cultural. Throttling orgs can be correct in specific cases, but it generally creates a disincentive towards building by reducing confidence in grantmaker support for ambitious projects. An influential memo from 2022 argued against “mass movement building” in AI safety on the grounds that it would dilute the quality of the field; subsequently, frontier AI companies grew 2-3x/year, apparently unconcerned by dilution. Training programs (e.g., BlueDot, MATS, ARENA) and incubators (e.g., Catalyze Impact, Seldon Lab, Halcyon Futures) arrived late relative to need; even now, they occupy low status positions relative to research orgs they helped build.
Potential counter-arguments
We don’t have enough good ideas to deploy talent at scale, so founders/field-builders aren’t important. I disagree; I think there are many promising AI safety research agendas that can absorb talent for high impact returns (e.g., AI control, scalable oversight, AI governance, open-weight safety, mech interp, unlearning, cooperative AI, AIXI safety, etc.). Also, if ideas are the bottleneck, a “hits-based approach” seems ideal! We should be launching more AI safety ideas bounties and contests, agenda incubators like Refine and the PIBBSS x Iliad residency, and research programs like AE Studio’s “Neglected Approaches” initiative. Most smart people are outside the AI safety ecosystem, so outreach and scaling seem critical to spawning more AI safety agendas.
We should be careful not to dilute the quality of the field by scaling too fast. I confess that I don’t really understand this concern. If outreach funnels attract a large number of low-caliber talent to AI safety, we can enforce high standards for research grants and second-stage programs like ARENA and MATS. If forums like LessWrong or the EA Forum become overcrowded with low-calibre posts, we can adjust content moderation or the effect of karma on visibility. As a last resort, field growth could be scaled back via throttled grant funding. Additionally, growing the AI safety field is far from guaranteed to reduce the average quality of research, as most smart people are not working on AI safety and, until recently, AI safety had poor academic legibility. Even if growing the field reduces the average researcher quality, I expect this will result in more net impact.
Great founders don’t need help/coddling; they make things happen regardless. While many great founders succeed in the absence of incubators or generous starting capital, Y Combinator has produced some great startups! Adding further resources to aid founders seems unlikely to be negative value and will likely help potential founders who lack access to high-value spaces like Constellation, LISA, or FAR Labs, which are frequented by grantmakers and AI safety tastemakers. As an example, if not for Lightcone Infrastructure’s Icecone workshop in Dec 2021-Jan 2022, I would probably have found it hard to make the necessary connections and positive impressions to help MATS scale.
What should we do?
Narrative shift: Prominent podcasts like 80,000 Hours should publish more interviews with AI safety founders and field-builders. Someone should launch an “AI safety founders” podcast/newsletter that spotlights top founders and their journeys.
Career surfaces: Career advisors like 80,000 Hours and Probably Good should make “AI safety org founder” and “AI safety field-builder” a first‑class career path in guides and advising. Incubators like Halcyon Futures, Catalyze Impact, Seldon Lab, Constellation Incubator, etc. should be given prominence.
Capital surfaces: Funders like Open Philanthropy should launch RFPs explicitly targeted towards new org formation with examples of high-impact projects they want founded.
Social surfaces: AI safety hubs like Constellation, LISA, FAR Labs, and Mox should host events for aspiring founders. Field-building programs should launch founders networks to provide warm intros to mentors/advisors, grantmakers/VCs, and fiscal sponsorship orgs.
How to become a founder
Apply to an incubator that understands AI safety, like Halcyon Futures, Catalyze Impact, Seldon Lab, Constellation Incubator, Fifty Years 5050 AI, Entrepeneur First def/acc. YC has also funded several AI security and interpretability startups.
Draft a 1‑page pitch or theory of change and circulate it via Slack groups, forums, office happy hours, or friends.
Join the AI Safety Founders Network and ask for feedback on your idea.
Apply to RFPs that allow org funding, like Open Philanthropy’s “Funding for work that builds capacity to address risks from transformative AI”.
Talk to VCs/angels when there’s a plausible revenue path. Brainstorm for-profit AI alignment org ideas.
Aren’t the central example of founders in AI Safety the people who founded Anthropic, OpenAI and arguably Deepmind? Right after that Mechanize comes to mind.
I am not fully sure what you mean by founders, but it seems to me that the best organizations were founded by people who also wrote a lot, and generally developed a good model of the problems in parallel to running an organization. Even this isn’t a great predictor. I don’t really know what is. It seems like generally working in the space is just super high variance.
To be clear, overall I do think many more people should found organization, but the arguments in this post seem really quite weak. The issue is really not that otherwise we “can’t scale the AI Safety field”. If anything it goes the other way around! If you just want to scale the AI safety field, go work at one of the existing big organizations like Anthropic, or Deepmind, or Far Labs or whatever. They can consume tons of talent, and you can probably work with them on capturing more talent (of course, I think the consequences of doing so for many of those orgs would be quite bad, but you don’t seem to think so).
Also, to expand some more on your coverage of counterarguments:
No, you can’t, because the large set of people you are trying to “filter out” will now take an adversarial stance towards you as they are not getting the resources they think they deserve from the field. This reduces the signal-to-noise ratio of almost all channels of talent evaluation, and in the worst case produces quite agentic groups of people actively trying to worsen the judgement of the field in order to gain entry.
I happen to have written a lot about this this week: Paranoia: A Beginner’s Guide for example has an explanation of lemons markets that applies straightforwardly to grant evaluations and program applications.
This is a thing that has happened all over the place, see for example the pressures on elite universities to drop admission standards and continue grade inflation by the many people that are now part of the university system, but wouldn’t have been in previous decades.
Summoning adversaries, especially ones that have built an identity around membership in your group should be done very carefully. See also Tell people as early as possible it’s not going to work out, which I also happen to have published this week.
Yes, and this was of course, quite bad for the world? I don’t know, maybe you are trying to model AI safety as some kind of race between AI Safety and the labs, but I think this largely fails to model the state of the field.
Like, again, man, do you really think the world would be at all different in terms of our progress on safety if everyone who works on whatever applied safety is supposedly so scalable had just never worked there? Kimi K2 is basically as aligned and as likely to be safe when scaled to superintelligence as whatever Anthropic is cooking up today. The most you can say is that safety researchers have been succeeding at producing evidence about the difficulty of alignment, but of course that progress has been enormously set back by all the safety researchers working at the frontier labs which the “scaling of the field” is just shoveling talent into, which has pressured huge numbers of people to drastically understate the difficulty and risks from AI.
I mean, and many of them don’t! CEA has not been lead by people with research experience for many years, and man I would give so much to have ended up in a world that went differently. IMO Open Phil’s community building has deeply suffered from a lack of situational awareness and strategic understanding of AI, and so massively dropped the ball. I think MATS’s biggest problem is roughly that approximately no one on the staff is a great researcher yourself, or even attempts to do any kind of the work you try to cultivate, which makes it much harder for you to steer the program.
Like, I am again all in favor of people starting more organizations, but man, we just need to understand that we don’t have the forces of the market on our side, and this means the premium we get for having people steer the organizations who have their own internal feedback loop and their own strategic map of the situation, which requires actively engaging with the core problems of the field, is much greater than it is in YC and the open market. The default outcome if you encourage young people to start an org in “AI Safety” is to just end up with someone making a bunch of vaguely safety-adjacent RL environments that get sold to big labs, that my guess is make things largely worse (I am not confident in this, but I am pretty confident it doesn’t make things much better).
And so what I am most excited about are people who do have good strategic takes starting organizations, and to demonstrate that they have that, and to develop the necessary skills, they need to write and publish publicly (or at least receive mentorship from someone who does have for a substantial period of time).
Thanks for reading and replying! I’ll be brief:
I consider the central examples of successful AI safety org founders to be Redwood, METR, Transluce, GovAI, Apollo, FAR AI, MIRI, LawZero, Pattern Labs, CAIS, Goodfire, Palisade, BlueDot, Constellation, MATS, Horizon, etc. Broader-focus orgs like 80,000 Hours, Lightcone, CEA and others have also had large impact. Apologies to all those I’ve missed!
I definitely think founders should workshop their ideas a lot, but this is not necessarily the same thing as publishing original research or writing on forums. Caveat: research org founders often should be leading research papers.
I don’t think that a great founder will have more impact in scaling the AI safety research field by working at “Anthropic, GDM, or FAR Labs” relative to founding a new research org or training program.
Maybe I’m naive about how easy it is to adjust standards for grantmakers or training programs. My experience with MATS, LISA, and Manifund has involved a lot of selection and the bar at MATS has raised every program for 4 years now, but I don’t feel a lot of pressure from rejected applicants to lower our standards. Maybe this will come with time? Or maybe it’s an ecosystem-wide effect? I see the pressure to increase elite university admissions pressure as unideal, but not a field-killer; plus, AI safety seems far from this point. I acknowledge that you have a lot of experience with LTFF and other selection processes.
I don’t think AI companies scaling 2-3x/year is good for the world. I do think AI safety talent failing to keep up is bad for the world. It’s not so much an adversarial dynamic as a race to lower the alignment tax as much as possible at every stage.
I don’t think that Anthropic’s safety work is zero value. I’d like to see more people working on ASL-4,5 safety at Anthropic and Kimi, all else equal. I’d also like to see more AI safety training programs supplying talent, nonprofits orgs scaling auditing and research, and advocacy orgs shifting public perception.
I’m not sure how to think about CEA (and I lack your information here), but my first reaction is not “CEA should have been led by researchers.” I also don’t think Open Phil is a good example of an org that lacked researchers; some of the best worldview investigations research imo came from Open Phil staff or affiliates, including Joe Carlsmith, Ajeya Cotra, Holden Karnofsky, Carl Schulman, etc. (edit: which clearly informed OP grant making).
I’m more optimistic than you about the impact of encouraging more AI safety founders. I’m particularly excited by Halcyon Future’s work in helping launch Goodfire, AIUC, Lucid Computing, Transluce, Seismic, AVERI, Fathom, etc. To date, I know of only two such RL dataset startups that spawned via AI safety (Mechanize, Calaveras) in contrast to ~150 AI safety-promoting orgs (though I’m sure there are other examples of AI safety-detracting startups).
I fully endorse more potential founders writing up pitches or theories of change for discussion on LW or founder networks! I think this can only strengthen their impact.
What?! Something terrible must be going on in your mechanisms for evaluating people (which to be clear, isn’t surprising, indeed, you are the central target of the optimization that is happening here, but like, to me it illustrates the risks here quite cleanly).
It is very very obvious to me that median MATS participant quality has gone down continuously for the last few cohorts. I thought this was somewhat clear to y’all and you thought it was worth the tradeoff of having bigger cohorts, but you thinking it has “gone up continuously” shows a huge disconnect.
Like, these days at the end of a MATS program half of the people couldn’t really tell you why AI might be an existential risk at all. Their eyes glaze over when you try to talk about AI strategy. IDK, maybe these people are better ML researchers, but obviously they are worse contributors to the field than the people in the early cohorts.
Yeah, I mean, I do think I am a lot more pessimistic about all of these. If you want we can make a bet on how well things have played out with these in 5 years, deferring to some small panel of trusted third party people.
Agree. Making RL environments/datasets has only very recently become a highly profitable thing, so you shouldn’t expect much! I am happy to make bets that we will see many more in the next 1-2 years.
I feel actively excited about 2 of these, quite negative about 1 of them, and confused/neutral about the others.
Can you share which?
The MATS acceptance rate was 33% in Summer 2022 (the first program with open applications) and decreased to 4.3% (in terms of first-stage applicants; ~7% if you only count those who completed all stages) in Summer 2025. Similarly, our mentor acceptance rate decreased from 100% in Summer 2022 to 27% for the upcoming Winter 2026 Program.
I don’t have plots prepared, but measures of scholar technical ability (e.g., mentor ratings, placements, CodeSignal score) have consistently increased. I feel very confident that MATS is consistently improving in our ability to find, train, and place ML (and other) researchers in AI safety roles, predominantly as “Iterators”. Also, while the fraction of the cohort that display strong “Connector” disposition seems to have decreased over time, I think that the raw number of strong Connectors has generally increased with program size due to our research diversity metric in mentor selection. I would argue that the phenomenon you are witnessing is an increasing pivot from more theoretical to empirical AI safety mentors and research agendas.
Based on my personal experience, I think the claim “half of MATS couldn’t tell you why AI might be an existential risk” is incorrect. I can’t speak to how MATS scholars have engaged with you on AI strategy, but I would bet that the average MATS scholar today spends a lot more time on ML experiments than reading AI safety strategy docs compared to three years ago. To be clear, I think this is a good thing! I respect your disagreement here. MATS has tried to run AI safety strategy workshops and reading groups many times in the past, but this has generally had low engagement relative to our seminar series (which features some prominent AI safety strategists anyways). If you have great ideas for how to better structure strategy workshops or generate interest, I would love to hear! (We are currently brainstorming this.)
I mean, in as much as one is worried about Goodhart’s law, and the issue in contention is adversarial selection, then the acceptance rate going down over time is kind of the premise of the conversation. Like, it would be evidence against my model of the situation if the acceptance rate had been going up (since that would imply MATS is facing less adversarial pressure over time).
Mentor ratings is the most interesting category to me. As you can imagine I don’t care much for ML skill at the margin. CodeSignal is a bit interesting though I am not familiar enough with it to interpret it, but I might look into it.
I don’t know whether you have any plots of mentor ratings over time broken out by individual mentor. My best guess is the reason why mentor ratings are going up is because you have more mentors who are looking for basically just ML skill, and you have successfully found a way to connect people into ML roles.
This is of course where most of your incentive gradient was pointing to in the first place, as of course the entities that are just trying to hire ML researchers have the most resources, and you will get the most applicants for highly paid industry ML roles, which are currently among the most prestigious and most highly paid roles in the world (while of course being centrally responsible for the risk from AI that we are working on).
In regards to adversarial selection, we can compare MATS to SPAR. SPAR accepted ~300 applicants in their latest batch, ~3x MATS (it’s easier to scale if you’re remote, don’t offer stipends, and allow part-timers). I would bet that the average research impact of SPAR participants is significantly lower than that of MATS, though there might be plenty of confounders here. It might be worth doing a longitudinal study here comparing various training programs’ outcomes over time, including PIBBSS, ERA, etc.
I think your read of the situation re. mentor ratings is basically correct: increasingly many MATS mentors primarily care about research execution ability (generally ML), not AI safety strategy knowledge. I see this as a feature, not a bug, but I understand why you disagree. I think you are prioritizing a different skillset than most mentors that our mentor selection committee rates highly. Interestingly, most of the technical mentors that you rate highly seem to primarily care about object-level research ability and think that strategy/research taste can be learned on the job!
Note that I think the pendulum might start to swing back towards mentors valuing high-level AI safety strategy knowledge as the Iterator archetype is increasingly replaced/supplemented by AI. The Amplifier archetype seems increasingly in-demand as orgs scale, and we might see a surge in Connectors as AI agents improve to the point that their theoretical ideas are more testable. Also note that we might have different opinions on the optimal ratio of “visionaries” vs. “experimenters” in an emerging research field.
I mean, sure? I am not saying your selection is worse than useless and it would be better for you to literally accept all of them, that would clearly also be bad for MATS.
I mean, there are obvious coordination problems here. In as much as someone is modeling MATS as a hiring pipeline, and not necessarily the one most likely to produce executive-level talent, you will have huge amounts of pressure to produce line-worker talent. This doesn’t mean the ecosystem doesn’t need executive-level talent (indeed, this post is partially about how we need more), but of course large scaling organizations create more pressure for line-working talent.
Two other issues with this paragraph:
Yes, I don’t think strategic judgement generally commutes. Most MATS mentors who I think are doing good research don’t necessarily themselves know what’s most important for the field.
I agree with the purported opinion that strategy/research taste can often be learned on the job. But I do feel very doomy about recruiting people who don’t seem to care deeply about x-risk. I would be kind of surprised if the mentors I am most excited about don’t have the same opinion, but it would be an interesting update if so!
I don’t particularly think these “archetypes” are real or track much of the important dimensions, so I am not really sure what you are saying here.
A few quick comments, on the same theme as but mostly unrelated to the exchange so far:
I’m not very sold on “cares about xrisk” as a key metric for technical researchers. I am more interested in people who want to very deeply understand how intelligence works (whether abstractly or in neural networks in particular). I think the former is sometimes a good proxy for the latter but it’s important not to conflate them. See this post for more.
Having said that, I don’t get much of a sense that many MATS scholars want to deeply understand how intelligence works. When I walked around the poster showcase at the most recent iteration of MATS, a large majority of the projects seemed like they’d prioritized pretty “shallow” investigations. Obviously it’s hard to complete deep scientific work in three months but at least on a quick skim I didn’t see many projects that seemed like they were even heading in that direction. (I’d cite Tom Ringstrom as one example of a MATS scholar who was trying to do deep and rigorous work, though I also think that his core assumptions are wrong.)
As one characterization of an alternative approach: my intership with Owain Evans back in 2017 consisted of me basically sitting around and thinking about AI safety for three months. I had some blog posts as output but nothing particularly legible. I think this helped nudge me towards thinking more deeply about AI safety subsequently (though it’s hard to assign specific credit).
There’s an incentive alignment problem where even if mentors want scholars to spend their time thinking carefully, the scholars’ careers will benefit most from legible projects. In my most recent MATS cohort I’ve selected for people who seem like they would be happy to just sit around and think for the whole time period without feeling much internal pressure to produce legible outputs. We’ll see how that goes.
Hmm, I was referring here to “who I would want to hire at Lightcone” (and similarly, who I expect other mentors would be interested in hiring for their orgs) where I do think I would want to hire people who are on board with that organizational mission.
At the field level, I think we probably still have some disagreement about how valuable people caring about the AI X-risk case is, but I feel a lot less strongly about it, and think I could end up pretty excited about a MATS-like program that is more oriented around doing ambitious understanding of the nature of intelligence.
Sounds like PIBBSS/PrincInt!
As an atypical applicant to MATS (no PhD, no coding/ technical skills, not early career, new to AI), I found it incredibly difficult to find mentors who were looking to hold space for just thinking about intelligence. I’d have loved to apply to a stream that involved just thinking, writing, being challenged and repeating until I’d a thesis worth pursuing. To me, it seemed more like most mentors were looking to test very specific hypothesis, and maybe it’s for all the reasons you’ve stated above. But for someone new and inexperienced, I felt pretty unsure about applying at all.
This is not counter-evidence to the accusation that scholar quality has been going downhill unless you add in several other assumptions.
It’s not supposed to be counter-evidence in its own right. I like to present the full picture.
“To be clear, I think this is a good thing! I respect your disagreement here. MATS has tried to run AI safety strategy workshops and reading groups many times in the past, but this has generally had low engagement relative to our seminar series”
I suspect that achieving high-engagement will be hard because fellows have to compete for extension funding.
True, but we accepted 75% of all scholars into the 6-month extension last program, so the pressure might not be that large now.
What percentage applied?
I might have a special view here since I did MATS 4.0 and 8.0.
I think I met some excellent people at MATS 8.0 but would not say they are stronger than 4.0, my guess is that quality went down slightly. I remember in 4.0 a few people that impressed me quite a lot, which I saw less in 8.0. (4.0 had more very incompetent people though)
I think this is sadly somewhat true, I talked with some people in 8.0 who didn’t seem to have any particular concern with AI existential risk or seemingly never really thought about that. However, I think most people were in fact very concerned about AI existential risk. I ran a poll at some point about Eliezer’s new book and a significant minority of students seemed to have pre-ordered Eleizer’s book, which I guess is a pretty good proxy for whether someone is seriously engaging with AI X-risk.
My guess is that the recruitment process might need another variable to measure rather than academics/coding/ml experience. The kind of thing that Tim Hua (8.0 scholar) has who created an AI psychosis bench.
Also it seems to me that if you build an organization that tries to fight against the end of the world from AI, somebody should say that. Might put off some people and perhaps that should happens early. Maybe the website should say: “AI could kill literally everyone, let’s try to do something!”. And maybe the people who heard this MATS thing is good to have on their CV to apply to a PhD or a lab to land a high paying job eventually would be put off by that.
Perhaps there should also be a test where people don’t have internet access and have to answer some basic alignment questions: like why could a system that we optimize with RL develop power seeking drives? Why might training an AI create weird unpredictable preferences in an AI?
My guess at what’s happening here: for the first iterations of MATS (think MATS 2.0 at the Lightcone WeWork) you would have folks who were already into AI Safety for quite a long time and were interested in doing some form of internship-like thing for a summer. But as you run more cohorts (and make the cohorts bigger) then the density of people who have been interested in safety for a long time naturally decreases (because all the people who were interested in safety for years already applied to previous iterations).
(Derailing, What I am saying here is not central to the argument you are making here)
While I think building safety-adjacent RL envs is worse than most kinds of technical safety work for people who are very high context in AGI safety, I think it’s net positive.
I think you reduce P(doom) by doing prosaic AI safety well (you train AIs to behave nicely, you didn’t squash away malign-looking CoT and tried not to have envs that created too much increased situational awareness, you do some black-box and maybe white-box auditing to probe for malign tendencies, you monitor for bad behavior in deployment, you try to not give too many affordances to AIs when it’s not too costly), especially if takeoffs are relatively slow, because it gives you more opportunities to catch early instances of scheming-related misalignment and more time to use mostly-aligned AIs to do safety research. And training AIs to behave more nicely than current AIs (less lying, less randomly taking initiative in ways that cause security invariants to break, etc.) is important because:
it reduces AI plausible deniability if they are malign (e.g. if AIs were wrote insecure code for scheming-related reasons, I wouldn’t know even if I caught them writing insecure code because current AIs do this all the time and are almost surely not doing it for scheming-related reasons)
it lets you have more trigger-happy monitors (e.g. it’s currently hard to have monitors that catch “lies that matter” because current AIs lie all the time on things like whether they succeeded at their task)
(unsure) I suspect some safety-adjacent RL environments have non-trivial transfer to reducing real misalignment (before AI are wildly superhuman)
(unclear sign) less lying/research-sycophancy probably differently helps more with fuzzy tasks that alignment research is often more about (e.g. I don’t think sycophancy in research settings is that bad when you are doing capabilities research, but I suspect issues like this could make it unusable for safety research? Unclear)
I think the main negative effect is making AGI companies look more competent and less insanely risky than they actually are and avoiding some warning shots. I don’t know how I feel about this. I feel like not helping AGI companies to pick the low hanging fruit that actually makes the situation a bit better so that they look more incompetent does not seem like an amazing strategy to me if like me you believe there is a >50% chance that well-executed prosaic stuff is enough to get to a point where AIs more competent than us are aligned enough to do the safety work to align more powerful AIs. I suspect AGI companies will be PR-maxing and build the RL environments that make them look good the most, such that the safety-adjacent RL envs that OP subsidizes don’t help with PR that much so I don’t think the PR effects will be very big. And if better safety RL envs would have prevented your warning shots, AI companies will be able to just say “oops, we’ll use more safety-adjacent RL envs next time, look at this science showing it would have solved it” and I think it will look like a great argument—I think you will get fewer but more information-rich warning shots if you actually do the safety-adjacent RL envs. (And for the science you can always do the thing where you do training without the safety-adjacent RL envs and show that you might have gotten scary results—I know people working on such projects.)
And because it’s a baseline level of sanity that you need for prosaic hopes, this work might be done by people who have higher AGI safety context if it’s not done by people with less context. (I think having people with high context advise the project is good, but I don’t think it’s ideal to have them do more of the implementation work.)
I think it’s a pretty high-variance activity! It’s not that I can’t imagine any kind of RL environment that might make things better, but most of them will just be used to make AIs “more helpful” and serve as generic training data to ascend the capabilities frontier.
Like, yes, there are some more interesting monitor-shaped RL environments, and I would actually be interested in digging into the details of how good or bad some of them would be, but the thing I am expecting here are more like “oh, we made a Wikipedia navigation environment, which reduces hallucinations in AI, which is totally helpful for safety I promise”, when really, I think that is just a straightforward capabilities push.
As part of my startup exploration, I would like to discuss this as well. It would be helpful to clarify my thinking on whether there’s a shape of such a business that could be meaningfully positive. I’ve started reaching out to people who work in the labs to get better context on this. I think it would be good to dig deeper into Evan’s comment on the topic.
I’m going to start a Google Doc, but I would love to talk in person with folks in the Bay about this to ideate and refine it faster.
This is consistent with founders being undervalued in AI safety relative to AI capabilities. My model of Elon for instance says that a big reason towards pivoting hard towards capabilities was that all the capabilities founders were receiving more status than the safety founders.
Sorry, I know this is tangential, but I’m curious — is it based on it being less psychosis-inducing in this investigation or are there more data points / is it known to be otherwise more aligned as well?
What do you think are ways to identify good strategic takes? This is something that seems rather fuzzy to me. It’s not clear how people are judging criteria like this or what they think is needed to improve on this.
I want to register disagreement. Multiplier effects are difficult to get and easy to overestimate. It’s very difficult to get other people working on the right problem, rather than slipping off and working on an easier but ultimately useless problem. From my perspective, it looks like MATS fell into this exact trap. MATS has kicked out ~all the mentors who were focused on real problems (in technical alignment) and has a large stack of new mentors working on useless but easy problems.
[Edit 5hrs later: I think this has too much karma because it’s political and aggressive. It’s a very low effort criticism without argument.]
To clarify, by “kicking out” Jeremy is referring to two mentors in particular, both of whom got a lot of support from PIBBSS and one of whom seemed to want more of an engineering assistant than a research scholar. I think both do important research and it was a tough decision, informed by our mentor selection committee, which included experts in their field, and past scholar feedback. I offered help hiring to both, including our alumni hiring database.
Re. “useless but easy problems”, we agree to disagree. Mentor selection at MATS is very hard, so we defer a lot to a committee of experts. Admittedly, choosing this committee necessarily entails some bias. I’d be interested if anyone wants to DM me nominations!
(Ryan is correct about what I’m referring to, and I don’t know any details).
I want to say publicly, since my comment above is a bit cruel in singling out MATS specifically: I think MATS is the most impressively well-run organisation that I’ve encountered, and overall supports good research. Ryan has engaged at length with my criticisms (both now and when I’ve raised them before), as have others on the MATS team, and I appreciate this a lot.
Ultimately most of our disagreements are about things that I think a majority of “the alignment field” is getting wrong. I think most people don’t consider it Ryan’s responsibility to do better at research prioritization than the field as a whole. But I do. It’s easy to shirk responsibility by deferring to committees, so I don’t consider that a good excuse.
A good excuse is defending the object-level research prioritization decisions, which Ryan and other MATS employees happily do. I appreciate them for this, and we agree to disagree for now.
Tying back to the OP, I maintain that multiplier effects are often overrated because of people “slipping off the real problem” and this is a particularly large problem with founders of new orgs.
I think that being a good founder in AI safety is very hard, and generally only recommend doing it after having some experience in the field—this strongly applies to research orgs, but also to eg field building. If you’re founding something, you need to constantly make judgements about what is best, and don’t really have mentors to defer to, unlike many entry level safety roles, and often won’t get clear feedback from reality if you get them wrong. And these are very hard questions, and if you don’t get them right, there’s a good chance your org is mediocre. I think this applies even to orgs within an existing research agenda (most attempts to found mech interp orgs seem doomed to me). Field building is a bit less dicey, but even then, you want strong community connections and a sense for what will and will not work.
I’m very excited for there to be more good founders in AI Safety, but don’t think loudly signal boosting this to junior people is a good way to achieve this. And imo “founding an org” is already pretty high status, at least if you’re perceived to have some momentum behind you?
I’m also fine with people without a lot of AI safety expertise partnering with those who do have it as co founders, but I struggle to think of orgs that I think have gone well who didn’t have at least one highly experienced and competent co-founder
Did apollo have anyone you’d consider highly experienced when first starting out?
I’d say Chris Akin (COO) was highly experienced, and he joined shortly after inception.
Neel was talking about AI safety expertise and experience in the AI safety field. I can’t see that Chris had any such experience on his linked-in.
Of note: when I first approached you about becoming a MATS mentor, I don’t think you had significant field-building or mentorship experience and had relatively few papers. Since then, you have become one of the most impactful field-builders, mentors, and researchers in AI safety, by my estimation! This is a bet I would take again.
I think that founding, like research, is best learned by doing. Building a research org definitely benefits from having great research takes; this unlocks funding, inspires talent, and creates better products (i.e., impactful research). However, I believe:
Not every great researcher would be a great founder.
Some researchers who could be great founders with practice are unnecessarily discouraged from trying.
There are many ways to aid AI safety as a founder that do not require research skills (e.g., field-building, advocacy, product development).
I wasn’t primarily trying to signal boost this to “junior” people and I think pairing strong ops and technical talent is a good way to start many orgs (though everyone typically contributes to everything in a small startup).
I think you are probably unusually good at spotting which mech interp orgs are doomed ex ante, but you aren’t infallible. And I think a situation where many small startups are being founded, even if most will be doomed, is what a functional startup ecosystem looks like! We don’t want people working on obviously bad ideas, but I naively expect the process of startup ideation and experimentation, aided by VC money, to yield good mech interp directions.
It’s very difficult to come with AI safety startup ideas that are VC-fundable. This seems like a recipe for coming up with nice-sounding but ultimately useless ideas, or wasting a lot of effort on stuff that looks good to VCs but doesn’t advance AI safety in any way.
Maybe so! I don’t think Eric Ho’s ideas are terrible and I’ve seen for-profit AI safety startups that I like (e.g., Goodfire) and that I don’t like (e.g., Softmax, probably).
I disagree with this frame. Founders should deeply understand the area they are founding an organization to deal with. It’s not enough to be “good at founding”.
I completely agree with you! Where did you think I implied the opposite?
My bad, I read you as disagreeing with Neel’s point that it’s good to gain experience in the field or otherwise become very competent at the type of thing your org is tackling before founding an AI safety org.
That is, I read “I think that founding, like research, is best learned by doing” as “go straight into founding and learn as you go along”.
No worries! I think research startups should be founded by strong researchers. But there are lots of potentially impactful startups (field-building, advocacy, product, etc.) that don’t require founders with research skills, and these might be best served by learning on the job?
I think those other types of startups also benefit from expertise and deep understanding of the relevant topics (for example, for advocacy, what are you advocating for and why, how well do you understand the surrounding arguments and thinking...). You don’t want someone who doesn’t understand the “field” working on “field-building”.
You’re probably right that the best startups come from people who have great experience in the thing, but plenty of profitable startups get founded by kids out of college. The risk/reward tradeoff is probably different in tech. I think the best AI safety field-building startups were founded/scaled by people with experience in fieldbuilding (e.g., my experience with an EA UQ, Dewi’s experience with EA Cambridge, Agus’ experience with CEA, etc.), but the bar might be surprisingly low.
I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.
The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.
Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I’d guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key effect flipped—it seems disturbingly common to have “paradoxical impact.”
I suspect the risk of paradoxical impact—even from your own work—is often substantial, especially in poorly understood domains. My favorite example of this is the career of Fritz Haber, who by discovering how to efficiently mass-produce fertilizer, explosives, and chemical weapons, seems plausibly to have both counterfactually killed and saved millions of lives.
But it’s even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result. So I do think it’s worth being fairly paranoid about this in high-stakes, poorly-understood domains, and perhaps especially so in AI safety, where numerous such skulls have already appeared.
I like the phrase “paradoxical impact”.
I feel considerations around paradoxical impact are a big part of my wworld model and I wowould like to see more discussion about it
See my post on pessimization.
I’m sorry to hear about your paradoxical impact; this sounds tough and it’s a fear I share. I feel a bit better about MATS’ impact because very few of our alumni work on AI capabilities at frontier labs (~2% by my estimation) and very few work at OpenAI altogether, but I can understand if you feel that the 22% working on AI safety at for-profit companies are primarily doing “safetywashing” or something (on net I disagree, but it’s a valid concern).
I think there is something for me to learn from your experience: at the time MIRI was running AIRCS, OpenAI was not an AI safety pariah; it’s possible that some of the companies that MATS alums join now will become pariahs in future, revealing paradoxical impact. I’m not sure what to do about this other than encourage people to be intentional with their careers, question assumptions, and “don’t do evil” (the MATS values are impact first, scout mindset, reasoning transparency, and servant leadership). I think that AI safety has to scale to have a chance at solving alignment in time; this means that some people will end up working on counter-productive things. I can understand if your risk tolerance is different than mine, or you are more skeptical about the impact of MATS or the founders who might be inspired by my post.
I do think I’d feel very alarmed by the 27% figure in your position—much more alarmed than e.g. I am about what happened with AIRCS, which seems to me to have failed more in the direction of low than actively bad impact—but to be clear I didn’t really mean to express a claim here about the overall sign of MATS; I know little about the program.
Rather, my point is just that multiplier effects are scary for much the same reason they are exciting—they are in effect low-information, high-leverage bets. Sometimes single conversations can change the course of highly effective people’s whole careers, which is wild; I think it’s easy to underestimate how valuable this can be. But I think it’s similarly easy to underestimate their risk, given that the source of this leverage—that you’re investing relatively little time getting to know them, etc, relative to the time they’ll spend doing… something as a result—also means you have unusually limited visibility into what the effects will be.
Given this, I think it’s worth taking unusual care, when pursuing multiplier effect strategies, to model the overall relative symmetry of available risks/rewards in the domain. For example, whether A) there might be lemons market problems, such that those who are easiest to influence (especially quickly) might tend all else equal to be more strategically confused/confusable, or B) whether there might in fact currently be more easy ways to make AI risk worse than better, etc.
Edit: I mistakenly said “27% at frontier labs” when I should have said “27% at for-profit companies”. Also, note that this is 27% of those working on AI safety (80%), so 22% of all alumni.
It is hard to predict this, but I think we could have done better (and can do better in the future still).
That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn’t made the problem known, since I roughly can’t imagine worlds where we do survive where the problem wasn’t made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won’t. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.
(Maybe this isn’t what you mean exactly, and there are ways we could have made the problem known that seemed less like “freaking out”? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).
My question is, how do you make AI risk known while minimizing the risk of paradoxical impacts? “Never talk about it” is the wrong answer, but I expect there’s a way to do better than we’ve done so far. This seems like an important thing to try to understand.
I don’t think this captures the counterarguments well. So here is one
You can imagine a spectrum of funders where on one hand, you have people who understand themselves as funders and want to be marshaling an army to solve AI alignment. On the other side, you have basically researchers who see work that should be done, don’t have capacity to do the work themselves, and this leads them to create teams and orgs—“reluctant founders”.
It’s reasonable to be skeptical about what the “funder type” end of the spectrum will do.
In normal startups, the ultimate feedback loop is provided by the market. In AI safety nonprofits, the main feedback loops are provided by funders, AGI labs, and Bay Area prestige gradients.
Bay Area prestige gradients are to a large extent captured by AGI labs—the majority of quality-weighted “AI safety” already works there, the work is “obviously impactful”, you are close to the game, etc. also normal ML people also want to work there.
If someone wants to scale a lot, “funders” means mostly OpenPhil—no other source would fund the army. The dominant OpenPhil worldview is closely related to Anthropic—for example, until recently you have hear from senior OP staff that working in the labs is often strategically the best thing you can do.
Taken together, it’s reasonable to expect the “funder type” to be captured by the incentive landscape and work on stuff that is quite aligned with AGI developers / what people working there want, need, or endorse, and/or what OP likes.
(A MATS skeptic could say this is also true about MATS: the main thing going on seems to be recruiting and training ML talent to work for “the labs”; in this perspective, given that AI safety is funding constrained, it seems unclear why scarce AI safety funding is best deployed to make recruitment & training easier for extremely well resourced companies)
Personally I’m more optimistic about people somewhere like ˜70% of the spectrum toward the research side, who mostly have some research taste, strategy, judgement… but I don’t think you attract them by the interventions you propose
I like this comment. I think it’s easy to overfit on the most salient research agendas, especially if there are echo chambers and tight coupling between highly paid frontier AI staff and nonprofit funders. The best way I know to combat this at MATS is:
Maintain a broad church of AI safety research, including deliberately making mentor “diversity picks” and choosing a mentor selection committee that contains divergent thinkers. As another example, I think Constellation has done a good job recently at expanding member diversity and reducing echo chambers.
Requiring that COIs be declared and mitigated, including along reporting chains, at the same organization, with romantic/sexual partners, and with frequent research collaborators.
Encouraging “scout mindset” and “reasoning transparency”, especially among people with divergent beliefs. I think this is a large strength of MATS: we are a melting pot for ideas and biases.
Note that I expect overfitting to decrease with further scale and diversity, given the above practices are adhered to!
I agree the AI safety field in general vastly undervalues building things, especially compared to winning intellectual status ladders (e.g. LessWrong posting, passing the Anthropic recruiting funnel, etc.).
However, as I’ve written before:
If you want to do interpretability research in the standard paradigm, Goodfire exists. If you want to do evals, METR exists. Now, new types of evals are valuable (e.g. Andon Labs & vending bench). And maybe there’s some interp paradigm that offers a breakthrough.
But why found? Because there is a problem where everyone else is dropping the ball, so there is no existing machine where you can turn the crank and get results towards that problem.
Now of course I have my opinions on where exactly everyone else is dropping the ball. But no doubt there are other things as well.
To pick up the balls, you don’t start the 5th evals company or the 4th interp lab. My worry is that that’s what all the steps listed in “How to be a founder” point towards. Incubators, circulating pitches, asking for feedback on ideas, applying to RFPs, talking to VCs—all of these are incredibly externally-directed, non-object-level, meta things. Distilling the zeitgeist. If a ball is dropped, it is usually because people don’t see that it is dropped, and you will not discover the dropedness by going around asking “hey what ball is dropped that the ecosystem is not realizing?”. You cannot crowdsource the idea.
This relates to another failure of AI safety culture: insufficient and bad strategic thinking, and a narrowmindedness over the solutions. “Not enough building” and “not enough strategy/ideas” sound opposed, when you put them on some sort of academic v doer spectrum. But the real spectrum is whether you’re winning or not, and “a lack of progress because everyone is turning the same few cranks and concrete building towards the goal is not happening” and “the existing types of large-scale efforts are wrong or insufficient” are, in a way, related failure modes.
Also, of course, beware of the skulls. “A frontier lab pursuing superintelligence, except actually good, this time, because we are trustworthy people and will totally use our power to take over the world for only good”
I definitely think marginal founders should focus on low-hanging fruit for impact. Do you have a list of potential startup ideas you like?
I have a different opinion about the utility of red teaming pitches/ToCs; based on experience, I think this can help spot blindspots in the ecosystem! I also think many AI safety founders, funders etc. are walking around with a long list of things they want someone to build; I have one, at least, and I’ve read a few.
I’m also not so sure that another evals or auditing company would be bad. There are only 3-4 decent-sized AI safety evals orgs! That’s a small number of people to analyze large, ever-changing models with vast threat surfaces. There’s plenty of room for differentiation and specialization (e.g., biorisk, cyber-risk, AI control evals, AI elicitation evals, human manipulation risk, bio R&D capabilities, AI coordination risk, etc.).
Maybe this is irrelevant, but I’d be surprised if a tech founder was deterred from founding a startup because a similar startup already exists, if there was high demand. In some cases, I might be concerned (e.g., regulatory capture of token government auditors), but I’m not concerned by doubling of Apollo, Goodfire, METR, Transluce, MATS, etc. Competition can be good! Maybe not as good as filling a gap, but it doesn’t seem net harmful to have more orgs working on the same problem; there’s plenty of funding, space to differentiate, and problems to work on!
for what it’s worth, I think Goodfire is taking a non-standard approach to interpretability research—more so than (e.g.) Transluce. (I’m not claiming that the non-standard approach is better than the standard one.)
This is my first Lesswrong comment—any feedback appreciated.
My quick takes (with a similar conflict: I’m doing AIS field-building).
I am inclined to agree with ~everything in this post.
I think the status dynamics are hard to overstate.
I know quite a few very competent builders / ‘doers’ who have bounced off EA/AIS.
And part of this is about the elevated status given to researchers, especially in contrast with the way ‘operations’ people (a catch-all used to encompass a large fraction of everything else) are treated.
The response I often hear, explicitly or in the undertones, is: ‘But if they were really committed, they’d just do the thing that needs to be done.’ So they are expected to ignore the status gradients and e.g. build a nonprofit that is illegible to those outside EA/AIS or a for-profit with a harder path to profitability.
Meanwhile, these communities are usually excited about people doing relevant-seeming AIS research, even when they might be doing so because it’s interesting or high-status, rather than because it’s the thing to do that has highest impact or is most important to AIS.
I think (c) is usually good—pain is not the unit of effort and people are usually more productive when they enjoy and feel valued for doing something.
But (b) and (c) together mean we hold people to a much higher standard for building than for researching. Unlike researchers, we expect builders to be really committed and fight against incentive gradients, rather than shifting the incentives. This is even though (low-confidence take) we might need the marginal builder more than the marginal ‘equally-skilled’ AIS researcher.
(Side note: In practice, I think the skills for these paths are fairly uncorrelated, such that this comparison is relevant for how we shape the field and prioritise people, but usually not whether a particular person does research or building.)
Two cruxes are as follows (hat tip Habryka—how do I tag him?):
How hard and/or necessary is it to manage a much larger AIS community / recruitment pipeline.
I think it might be necessary to massively grow the AIS community, in which case we just have to pay the ‘trust penalty’ of having some schemers try to get resources and status, making it harder to assess people and make progress in general.
It’s harder to screen for qualities you want when many people are vying for jobs and might be Goodharting the criteria. But how much harder.
On the one hand, many companies in competitive industries face this problem and still seem to do fine (investment banks, top consulting firms, quant trading firms, etc.).
On the other hand, AIS often has bad feedback loops, such that it’s harder to tell if someone is not really optimising the important thing (or even explicitly just focusing on looking good) - and success may be much harder.
I think I would change my view if I thought it was unnecessary and extremely hard to manage a much larger AIS community / recruitment pipeline.
I want to register a strong, although potentially unfair, gut reaction to Habryka’s comment, which was something along the lines of: ‘This feels like classic rationalisty ex-post justification for prioritising vibes over winning. A smaller community with people who are more value-aligned can feel easier, but AIS probably just needs to grow a lot. Maybe you don’t like having to figure out people’s intentions and be in a world where people aren’t so transparent. But maybe it’s needed. This vaguely reminds me of EAs saying they need to hire other EAs when perhaps the real problem is that they’re not good enough at management to manage non-EAs.’
How important is deep research understanding to building successful AIS orgs?
If you need strong research understanding to build useful orgs, maybe we should prioritise people to do research first. Then the question would be how to shift incentives to move people out of pure research and into org-building later.
Empirically, the most successful—or at least influential—AIS orgs have been built by people with strong research understanding.
My guess is that we’re underusing non-researcher/researcher teams, and also that there might be some stuff (e.g. in biosecurity) that people with very little research background can build successfully. There’s also probably a tradeoff ((2(e)(i)) between great researchers and great builders, and we need more than just research orgs.
Great comment! To your point about shifting incentives of researchers to be founders, about 10% of MATS alums have founded something and about the only thing I did was give a lightning talk every program and tell interested founders to chat with me. I think founders tend to self-select once you make the option clear, which is part of my intent with this post. Note that I’m not trying to claim credit for all the founders who came through MATS; I expect most were already interested in founding things.
Also, note that Catalyze Impact (and maybe other incubators) has received tons of applications from researchers. I agree that founder skills and researcher skills are not the same thing, but research orgs tend to be led by researchers. Even large research nonprofits like RAND, AI2, ATI, SFI have leaders who spent some time in research roles, though usually not for most of their careers.
I don’t think I quite understand the distinction you are trying to draw between “founders” and (not a literal quote) “people who do object-level work and make intellectual contributions by writing”.
If you’re the CEO of a company, it’s your job to understand the space your company works in and develop extremely good takes about where the field is going and what your company should do, and use your expertise in leveraged ways to make the company go better.
In the context of AI safety, the key product that organizations are trying to produce is often itself research, and a key input is hiring talented people. So I think it makes a lot of sense that e.g. I spend a lot of my time thinking about the research that’s happening at my org.
Analogously, I don’t think it should be considered surprising or foolish if Elon Musk knows a lot about rockets and spends a lot of his time talking to engineers about rockets.
I do think that I am personally more motivated to do novel intellectual work than would be optimal for Redwood’s interests.
I also think that the status gradients and social pressures inside the AI safety community have a variety of distorting effects on my motivations that probably cause me to take worse actions.
I think you personally feel the status gradient problems more than other AI safety executives do because a lot of AI safety people undervalue multiplier efforts. And this has meant that working at MATS is less prestigious and therefore has more trouble hiring than I’d like.
I think you’re a great example of a successful founder who is also a prolific researcher and writer. I wish I had your capacity for the last two; you’ve been high impact in all three channels!
I think you’re right in that research startups should generally be led by researchers, and good researchers track the field closely and ideally publish. I think at some size of organization, this becomes much harder, but I don’t want to deter it! If Elon wants to go deep on his rockets, this seems good, even if he’s an outlier CEO.
I was trying to say two somewhat related things in this article:
The status gradients strongly favor “become a researcher” over “become a founder”, which means we have less founders than ideal and our successful founders tend to follow the “lab PI” archetype, for better or worse.
Implied: there is plenty of value that founders in non-research roles can have (field-building, advocacy, product development, etc.) and this is systematically undervalued relative to the impact, which discourages people from trying.
For your point 2, are you thinking about founders in organizations that have theories of change other than doing research? Or are you thinking of founders at research orgs?
The former. Even large research nonprofits (e.g., RAND, AI2, ATI, SFI) tend to be led by people with research experience, though they probably do a lot less research than CEOs at small research orgs.
I totally agree with the sentiment here!
As both a researcher, founder, and early employee of multiple non-profits around this space, I think it’s critical to start building out the infrastructure to leverage talent and enable safety work. Right now, there isn’t much to support people making their own opportunities, not to mention that doing so necessarily requires a more stable financial situation than is possible for many individuals.
One of my core goals starting Kairos.fm was to help others who are wanting to start their own projects (e.g. podcasts), and to amplify the reach of others.
While I’m not solely focused on founders/field builders, I have had quite a few on one of my shows, and I’d be incredibly excited to have more.
For any founders reading this, I would love to have you as a guest on the Into AI Safety podcast to discuss your journey and work. If you’re interested, reach out to me at listen@kairos.fm.
Founders seem especially undervalued if you think in counterfactual terms. For an existing org, the right question is usually “what’s this person’s marginal value over a reasonably good replacement hire?” For a founder, the question is often “would this organization with this theory of change and all the roles it creates exist at all without them?” That’s a qualitatively different kind of contribution.
In a field as nascent as AI safety, new orgs don’t just scale existing work; they create net-new surface area for people to do safety-motivated work at all, including people with more operational, policy, or domain backgrounds who don’t fit neatly into the current research-centric pipeline. That seems like a strong structural reason to place more weight on founder/field‑builder impact, not less, especially when a new org is doing something meaningfully distinct from existing approaches.
escaping flatland: career advice for CS undergrads
Hey Ryan, nice post. Here are some thoughts.
I think AI safety founders should be risk-averse.
For-profit investors like risk-seeking founders because for-profit orgs have unlimited upside and limited downside (you can’t lose more money than you invest), and hence investors can expect ROI on a portfolio of high-variance, decorrelated startups. You get high variance with risk-seeking founders, and decorrelation with contrarian founders. But AI safety isn’t like this. The downside is just as unlimited as the upside, so you can’t expect ROI simply because the orgs are high-variance and uncorrelated, c.f. unilateralist curse.
I think frontier labs have an easier time selecting for talent than AI safety orgs. Partly because they need to care less about virtue/mission alignment.
Cheers, Cleo!
Even if AI safety founders should be risk averse, I think we should do better at supporting the relatively few competent founder-types who are deeply interested in AI safety.
I suspect that we disagree significantly on the potential downside risk of most AI safety startups. I think it’s relatively hard to have a significant negative impact, particularly one that outweighs the expected benefits, given how much optimization pressure is being applied to advancing AI capabilities across the economy. Creating a new frontier AI company (e.g., Mistral-sized) or a toxic advocacy org would be notable exception. Maybe Mechanize and Calaveras are exceptions too?
Note that at least Anthropic has a hard time finding talent that is also mission-aligned, which they prefer, particularly for safety teams.
I suspect that the undervaluing of field-building is downstream of EA overupdating on The Meta Trap (I appreciated points 1 & 5; point 2 probably looks worst in retrospect).
I don’t know if founding is still undervalued—seems like there’s a lot in the space these days.
”I confess that I don’t really understand this concern”
Have you heard of Eternal September? If a field/group/movement grows at less than a certain rate, then there’s time for new folks to absorb the existing culture/knowledge/strategic takes and then pass it on to the folks after them. However, this breaks down if the growth happens too fast.
”We should be careful not to dilute the quality of the field by scaling too fast… If outreach funnels attract a large number of low-caliber talent to AI safety, we can enforce high standards for research grants and second-stage programs like ARENA and MATS. If forums like LessWrong or the EA Forum become overcrowded with low-calibre posts, we can adjust content moderation or the effect of karma on visibility.”
Firstly, filtering/selection time isn’t free. It takes money, time from high-skilled people and also increases the chance of good candidates being overlooked in the sea of applications since it forces you to filter more aggressively.
Secondly, people need high-quality peers in order to develop intellectually. Even if second-stage programs manage to avoid being diluted, adding a bunch of low-caliber talent to local community groups would make it harder for people to develop intellectually before reaching the second-stage programs; in other words it’d undercut the talent development pipeline for these later stage programs.
“Additionally, growing the AI safety field is far from guaranteed to reduce the average quality of research, as most smart people are not working on AI safety and, until recently, AI safety had poor academic legibility. Even if growing the field reduces the average researcher quality, I expect this will result in more net impact”
I suspect AI safety research is very heavy-tailed and what would encourage the best folks to enter the field is not so much the field being large so much as the field having a high densitiy of talent.
Thanks for sharing, Akshyae! Based on the DMs I received after posting this, I think your experience is unfortunately common. Great job sticking at it and launching Explainable!
This comment was pretty obviously AI written and shouldn’t have made it past the LessWrong content moderation! (Sorry about that, we have automatic AI written flagging for posts, but haven’t yet activated it for comments)
It’s plausible it was written reflecting a real human experience, but I wouldn’t trust it. It got very high scores on 3 AI-detection platforms I tried. (And also, posting AI slop in this context feels particularly sad)