I consider the central examples of successful AI safety org founders to be Redwood, METR, Transluce, GovAI, Apollo, FAR AI, MIRI, LawZero, Pattern Labs, CAIS, Goodfire, Palisade, BlueDot, Constellation, MATS, Horizon, etc. Broader-focus orgs like 80,000 Hours, Lightcone, CEA and others have also had large impact. Apologies to all those I’ve missed!
I definitely think founders should workshop their ideas a lot, but this is not necessarily the same thing as publishing original research or writing on forums. Caveat: research org founders often should be leading research papers.
I don’t think that a great founder will have more impact in scaling the AI safety research field by working at “Anthropic, GDM, or FAR Labs” relative to founding a new research org or training program.
Maybe I’m naive about how easy it is to adjust standards for grantmakers or training programs. My experience with MATS, LISA, and Manifund has involved a lot of selection and the bar at MATS has raised every program for 4 years now, but I don’t feel a lot of pressure from rejected applicants to lower our standards. Maybe this will come with time? Or maybe it’s an ecosystem-wide effect? I see the pressure to increase elite university admissions pressure as unideal, but not a field-killer; plus, AI safety seems farfrom this point. I acknowledge that you have a lot of experience with LTFF and other selection processes.
I don’t think AI companies scaling 2-3x/year is good for the world. I do think AI safety talent failing to keep up is bad for the world. It’s not so much an adversarial dynamic as a race to lower the alignment tax as much as possible at every stage.
I don’t think that Anthropic’s safety work is zero value. I’d like to see more people working on ASL-4,5 safety at Anthropic and Kimi, all else equal. I’d also like to see more AI safety training programs supplying talent, nonprofits orgs scaling auditing and research, and advocacy orgs shifting public perception.
I’m not sure how to think about CEA (and I lack your information here), but my first reaction is not “CEA should have been led by researchers.” I also don’t think Open Phil is a good example of an org that lacked researchers; some of the best worldview investigations research imo came from Open Phil staff or affiliates, including Joe Carlsmith, Ajeya Cotra, Holden Karnofsky, Carl Schulman, etc.
I’m more optimistic than you about the impact of encouraging more AI safety founders. I’m particularly excited by Halcyon Future’s work in helping launch Goodfire, AIUC, Lucid Computing, Transluce, Seismic, AVERI, Fathom, etc. To date, I know of only two such RL dataset startups that spawned via AI safety (Mechanize, Calaveras) in contrast to ~150 AI safety-promoting orgs (though I’m sure there are other examples of AI safety-detracting startups).
I fully endorse more potential founders writing up pitches or theories of change for discussion on LW or founder networks! I think this can only strengthen their impact.
the bar at MATS has raised every program for 4 years now
What?! Something terrible must be going on in your mechanisms for evaluating people (which to be clear, isn’t surprising, indeed, you are the central target of the optimization that is happening here, but like, to me it illustrates the risks here quite cleanly).
It is very very obvious to me that median MATS participant quality has gone down continuously for the last few cohorts. I thought this was somewhat clear to y’all and you thought it was worth the tradeoff of having bigger cohorts, but you thinking it has “gone up continuously” shows a huge disconnect.
Like, these days at the end of a MATS program half of the people couldn’t really tell you why AI might be an existential risk at all. Their eyes glaze over when you try to talk about AI strategy. IDK, maybe these people are better ML researchers, but obviously they are worse contributors to the field than the people in the early cohorts.
Yeah, I mean, I do think I am a lot more pessimistic about all of these. If you want we can make a bet on how well things have played out with these in 5 years, deferring to some small panel of trusted third party people.
To date, I know of only two such RL dataset startups that spawned via AI safety
Agree. Making RL environments/datasets has only very recently become a highly profitable thing, so you shouldn’t expect much! I am happy to make bets that we will see many more in the next 1-2 years.
The MATS acceptance rate was 33% in Summer 2022 (the first program with open applications) and decreased to 4.3% (in terms of first-stage applicants; ~7% if you only count those who completed all stages) in Summer 2025. Similarly, our mentor acceptance rate decreased from 100% in Summer 2022 to 27% for the upcoming Winter 2026 Program.
I don’t have plots prepared, but measures of scholar technical ability (e.g., mentor ratings, placements, CodeSignal score) have consistently increased. I feel very confident that MATS is consistently improving in our ability to find, train, and place ML (and other) researchers in AI safety roles, predominantly as “Iterators”. Also, while the fraction of the cohort that display strong “Connector” disposition seems to have decreased over time, I think that the raw number of strong Connectors has generally increased with program size due to our research diversity metric in mentor selection. I would argue that the phenomenon you are witnessing is an increasing pivot from more theoretical to empirical AI safety mentors and research agendas.
Based on my personal experience, I think the claim “half of MATS couldn’t tell you why AI might be an existential risk” is incorrect. I can’t speak to how MATS scholars have engaged with you on AI strategy, but I would bet that the average MATS scholar today spends a lot more time on ML experiments than reading AI safety strategy docs compared to three years ago. To be clear, I think this is a good thing! I respect your disagreement here. MATS has tried to run AI safety strategy workshops and readinggroups many times in the past, but this has generally had low engagement relative to our seminar series (which features some prominent AI safety strategists anyways). If you have great ideas for how to better structure strategy workshops or generate interest, I would love to hear! (We are currently brainstorming this.)
The MATS acceptance rate was 33% in Summer 2022 (the first program with open applications) and decreased to 4.3% (in terms of first-stage applicants; ~7% if you only count those who completed all stages) in Summer 2025. Similarly, our mentor acceptance rate decreased from 100% in Summer 2022 to 27% for the upcoming Winter 2026 Program.
I mean, in as much as one is worried about Goodhart’s law, and the issue in contention is adversarial selection, then the acceptance rate going down over time is kind of the premise of the conversation. Like, it would be evidence against my model of the situation if the acceptance rate had been going up (since that would imply MATS is facing less adversarial pressure over time).
I don’t have plots prepared, but measures of scholar technical ability (e.g., mentor ratings, placements, CodeSignal score) have consistently increased. I feel very confident that MATS is consistently improving in our ability to find, train, and place ML (and other) researchers in AI safety roles, predominantly as “Iterators”.
Mentor ratings is the most interesting category to me. As you can imagine I don’t care much for ML skill at the margin. CodeSignal is a bit interesting though I am not familiar enough with it to interpret it, but I might look into it.
I don’t know whether you have any plots of mentor ratings over time broken out by individual mentor. My best guess is the reason why mentor ratings are going up is because you have more mentors who are looking for basically just ML skill, and you have successfully found a way to connect people into ML roles.
This is of course where most of your incentive gradient was pointing to in the first place, as of course the entities that are just trying to hire ML researchers have the most resources, and you will get the most applicants for highly paid industry ML roles, which are currently among the most prestigious and most highly paid roles in the world (while of course being centrally responsible for the risk from AI that we are working on).
The MATS acceptance rate was 33% in Summer 2022 (the first program with open applications) and decreased to 4.3% (in terms of first-stage applicants; ~7% if you only count those who completed all stages) in Summer 2025. Similarly, our mentor acceptance rate decreased from 100% in Summer 2022 to 27% for the upcoming Winter 2026 Program.
This is not counter-evidence to the accusation that scholar quality has been going downhill unless you add in several other assumptions.
Thanks for reading and replying! I’ll be brief:
I consider the central examples of successful AI safety org founders to be Redwood, METR, Transluce, GovAI, Apollo, FAR AI, MIRI, LawZero, Pattern Labs, CAIS, Goodfire, Palisade, BlueDot, Constellation, MATS, Horizon, etc. Broader-focus orgs like 80,000 Hours, Lightcone, CEA and others have also had large impact. Apologies to all those I’ve missed!
I definitely think founders should workshop their ideas a lot, but this is not necessarily the same thing as publishing original research or writing on forums. Caveat: research org founders often should be leading research papers.
I don’t think that a great founder will have more impact in scaling the AI safety research field by working at “Anthropic, GDM, or FAR Labs” relative to founding a new research org or training program.
Maybe I’m naive about how easy it is to adjust standards for grantmakers or training programs. My experience with MATS, LISA, and Manifund has involved a lot of selection and the bar at MATS has raised every program for 4 years now, but I don’t feel a lot of pressure from rejected applicants to lower our standards. Maybe this will come with time? Or maybe it’s an ecosystem-wide effect? I see the pressure to increase elite university admissions pressure as unideal, but not a field-killer; plus, AI safety seems far from this point. I acknowledge that you have a lot of experience with LTFF and other selection processes.
I don’t think AI companies scaling 2-3x/year is good for the world. I do think AI safety talent failing to keep up is bad for the world. It’s not so much an adversarial dynamic as a race to lower the alignment tax as much as possible at every stage.
I don’t think that Anthropic’s safety work is zero value. I’d like to see more people working on ASL-4,5 safety at Anthropic and Kimi, all else equal. I’d also like to see more AI safety training programs supplying talent, nonprofits orgs scaling auditing and research, and advocacy orgs shifting public perception.
I’m not sure how to think about CEA (and I lack your information here), but my first reaction is not “CEA should have been led by researchers.” I also don’t think Open Phil is a good example of an org that lacked researchers; some of the best worldview investigations research imo came from Open Phil staff or affiliates, including Joe Carlsmith, Ajeya Cotra, Holden Karnofsky, Carl Schulman, etc.
I’m more optimistic than you about the impact of encouraging more AI safety founders. I’m particularly excited by Halcyon Future’s work in helping launch Goodfire, AIUC, Lucid Computing, Transluce, Seismic, AVERI, Fathom, etc. To date, I know of only two such RL dataset startups that spawned via AI safety (Mechanize, Calaveras) in contrast to ~150 AI safety-promoting orgs (though I’m sure there are other examples of AI safety-detracting startups).
I fully endorse more potential founders writing up pitches or theories of change for discussion on LW or founder networks! I think this can only strengthen their impact.
What?! Something terrible must be going on in your mechanisms for evaluating people (which to be clear, isn’t surprising, indeed, you are the central target of the optimization that is happening here, but like, to me it illustrates the risks here quite cleanly).
It is very very obvious to me that median MATS participant quality has gone down continuously for the last few cohorts. I thought this was somewhat clear to y’all and you thought it was worth the tradeoff of having bigger cohorts, but you thinking it has “gone up continuously” shows a huge disconnect.
Like, these days at the end of a MATS program half of the people couldn’t really tell you why AI might be an existential risk at all. Their eyes glaze over when you try to talk about AI strategy. IDK, maybe these people are better ML researchers, but obviously they are worse contributors to the field than the people in the early cohorts.
Yeah, I mean, I do think I am a lot more pessimistic about all of these. If you want we can make a bet on how well things have played out with these in 5 years, deferring to some small panel of trusted third party people.
Agree. Making RL environments/datasets has only very recently become a highly profitable thing, so you shouldn’t expect much! I am happy to make bets that we will see many more in the next 1-2 years.
The MATS acceptance rate was 33% in Summer 2022 (the first program with open applications) and decreased to 4.3% (in terms of first-stage applicants; ~7% if you only count those who completed all stages) in Summer 2025. Similarly, our mentor acceptance rate decreased from 100% in Summer 2022 to 27% for the upcoming Winter 2026 Program.
I don’t have plots prepared, but measures of scholar technical ability (e.g., mentor ratings, placements, CodeSignal score) have consistently increased. I feel very confident that MATS is consistently improving in our ability to find, train, and place ML (and other) researchers in AI safety roles, predominantly as “Iterators”. Also, while the fraction of the cohort that display strong “Connector” disposition seems to have decreased over time, I think that the raw number of strong Connectors has generally increased with program size due to our research diversity metric in mentor selection. I would argue that the phenomenon you are witnessing is an increasing pivot from more theoretical to empirical AI safety mentors and research agendas.
Based on my personal experience, I think the claim “half of MATS couldn’t tell you why AI might be an existential risk” is incorrect. I can’t speak to how MATS scholars have engaged with you on AI strategy, but I would bet that the average MATS scholar today spends a lot more time on ML experiments than reading AI safety strategy docs compared to three years ago. To be clear, I think this is a good thing! I respect your disagreement here. MATS has tried to run AI safety strategy workshops and reading groups many times in the past, but this has generally had low engagement relative to our seminar series (which features some prominent AI safety strategists anyways). If you have great ideas for how to better structure strategy workshops or generate interest, I would love to hear! (We are currently brainstorming this.)
I mean, in as much as one is worried about Goodhart’s law, and the issue in contention is adversarial selection, then the acceptance rate going down over time is kind of the premise of the conversation. Like, it would be evidence against my model of the situation if the acceptance rate had been going up (since that would imply MATS is facing less adversarial pressure over time).
Mentor ratings is the most interesting category to me. As you can imagine I don’t care much for ML skill at the margin. CodeSignal is a bit interesting though I am not familiar enough with it to interpret it, but I might look into it.
I don’t know whether you have any plots of mentor ratings over time broken out by individual mentor. My best guess is the reason why mentor ratings are going up is because you have more mentors who are looking for basically just ML skill, and you have successfully found a way to connect people into ML roles.
This is of course where most of your incentive gradient was pointing to in the first place, as of course the entities that are just trying to hire ML researchers have the most resources, and you will get the most applicants for highly paid industry ML roles, which are currently among the most prestigious and most highly paid roles in the world (while of course being centrally responsible for the risk from AI that we are working on).
This is not counter-evidence to the accusation that scholar quality has been going downhill unless you add in several other assumptions.
It’s not supposed to be counter-evidence in its own right. I like to present the full picture.