Jan_Kulveit comments on AI safety undervalues founders

Jan_Kulveit 16 Nov 2025 14:02 UTC
29 points
13
I don’t think this captures the counterarguments well. So here is one

You can imagine a spectrum of funders where on one hand, you have people who understand themselves as funders and want to be marshaling an army to solve AI alignment. On the other side, you have basically researchers who see work that should be done, don’t have capacity to do the work themselves, and this leads them to create teams and orgs—“reluctant founders”.

It’s reasonable to be skeptical about what the “funder type” end of the spectrum will do.

In normal startups, the ultimate feedback loop is provided by the market. In AI safety nonprofits, the main feedback loops are provided by funders, AGI labs, and Bay Area prestige gradients.
Bay Area prestige gradients are to a large extent captured by AGI labs—the majority of quality-weighted “AI safety” already works there, the work is “obviously impactful”, you are close to the game, etc. also normal ML people also want to work there.
If someone wants to scale a lot, “funders” means mostly OpenPhil—no other source would fund the army. The dominant OpenPhil worldview is closely related to Anthropic—for example, until recently you have hear from senior OP staff that working in the labs is often strategically the best thing you can do.

Taken together, it’s reasonable to expect the “funder type” to be captured by the incentive landscape and work on stuff that is quite aligned with AGI developers / what people working there want, need, or endorse, and/or what OP likes.

(A MATS skeptic could say this is also true about MATS: the main thing going on seems to be recruiting and training ML talent to work for “the labs”; in this perspective, given that AI safety is funding constrained, it seems unclear why scarce AI safety funding is best deployed to make recruitment & training easier for extremely well resourced companies)

Personally I’m more optimistic about people somewhere like ˜70% of the spectrum toward the research side, who mostly have some research taste, strategy, judgement… but I don’t think you attract them by the interventions you propose
- Ryan Kidd 16 Nov 2025 18:59 UTC
  3 points
  0
  Parent
  I like this comment. I think it’s easy to overfit on the most salient research agendas, especially if there are echo chambers and tight coupling between highly paid frontier AI staff and nonprofit funders. The best way I know to combat this at MATS is:
  - Maintain a broad church of AI safety research, including deliberately making mentor “diversity picks” and choosing a mentor selection committee that contains divergent thinkers. As another example, I think Constellation has done a good job recently at expanding member diversity and reducing echo chambers.
  - Requiring that COIs be declared and mitigated, including along reporting chains, at the same organization, with romantic/sexual partners, and with frequent research collaborators.
  - Encouraging “scout mindset” and “reasoning transparency”, especially among people with divergent beliefs. I think this is a large strength of MATS: we are a melting pot for ideas and biases.
  Note that I expect overfitting to decrease with further scale and diversity, given the above practices are adhered to!