L Rudolf L comments on AI safety undervalues founders

L Rudolf L 16 Nov 2025 15:23 UTC
22 points
17
I agree the AI safety field in general vastly undervalues building things, especially compared to winning intellectual status ladders (e.g. LessWrong posting, passing the Anthropic recruiting funnel, etc.).
However, as I’ve written before:
[...] the real value of doing things that are startup-like comes from [...] creating new things, rather than scaling existing things [...]
If you want to do interpretability research in the standard paradigm, Goodfire exists. If you want to do evals, METR exists. Now, new types of evals are valuable (e.g. Andon Labs & vending bench). And maybe there’s some interp paradigm that offers a breakthrough.
But why found? Because there is a problem where everyone else is dropping the ball, so there is no existing machine where you can turn the crank and get results towards that problem.
Now of course I have my opinions on where exactly everyone else is dropping the ball. But no doubt there are other things as well.
To pick up the balls, you don’t start the 5th evals company or the 4th interp lab. My worry is that that’s what all the steps listed in “How to be a founder” point towards. Incubators, circulating pitches, asking for feedback on ideas, applying to RFPs, talking to VCs—all of these are incredibly externally-directed, non-object-level, meta things. Distilling the zeitgeist. If a ball is dropped, it is usually because people don’t see that it is dropped, and you will not discover the dropedness by going around asking “hey what ball is dropped that the ecosystem is not realizing?”. You cannot crowdsource the idea.
This relates to another failure of AI safety culture: insufficient and bad strategic thinking, and a narrowmindedness over the solutions. “Not enough building” and “not enough strategy/ideas” sound opposed, when you put them on some sort of academic v doer spectrum. But the real spectrum is whether you’re winning or not, and “a lack of progress because everyone is turning the same few cranks and concrete building towards the goal is not happening” and “the existing types of large-scale efforts are wrong or insufficient” are, in a way, related failure modes.
Also, of course, beware of the skulls. “A frontier lab pursuing superintelligence, except actually good, this time, because we are trustworthy people and will totally use our power to take over the world for only good”
- Ryan Kidd 16 Nov 2025 17:18 UTC
  3 points
  0
  Parent
  I definitely think marginal founders should focus on low-hanging fruit for impact. Do you have a list of potential startup ideas you like?
  I have a different opinion about the utility of red teaming pitches/ToCs; based on experience, I think this can help spot blindspots in the ecosystem! I also think many AI safety founders, funders etc. are walking around with a long list of things they want someone to build; I have one, at least, and I’ve read a few.
  I’m also not so sure that another evals or auditing company would be bad. There are only 3-4 decent-sized AI safety evals orgs! That’s a small number of people to analyze large, ever-changing models with vast threat surfaces. There’s plenty of room for differentiation and specialization (e.g., biorisk, cyber-risk, AI control evals, AI elicitation evals, human manipulation risk, bio R&D capabilities, AI coordination risk, etc.).
  Maybe this is irrelevant, but I’d be surprised if a tech founder was deterred from founding a startup because a similar startup already exists, if there was high demand. In some cases, I might be concerned (e.g., regulatory capture of token government auditors), but I’m not concerned by doubling of Apollo, Goodfire, METR, Transluce, MATS, etc. Competition can be good! Maybe not as good as filling a gap, but it doesn’t seem net harmful to have more orgs working on the same problem; there’s plenty of funding, space to differentiate, and problems to work on!
- Cleo Nardo 19 Nov 2025 22:33 UTC
  2 points
  2
  Parent
  If you want to do interpretability research in the standard paradigm, Goodfire exists.
  for what it’s worth, I think Goodfire is taking a non-standard approach to interpretability research—more so than (e.g.) Transluce. (I’m not claiming that the non-standard approach is better than the standard one.)