What Makes an AI Startup “Net Positive” for Safety?

In light of the recent news from Mechanize/​Epoch and the community discussion it sparked, I’d like to open a conversation about a question some of us grapple with: What constitutes a net-positive AI startup from an AI safety perspective, and what steps can founders take to demonstrate goodwill and navigate the inherent pressures?

This feels like an important conversation to have because AI safety seems to have been increasingly encouraging people in the community to build startups (due to a lack of funding, potentially higher ceiling for society-wide impact, etc.). I’ve thought a lot about this, and have been going back and forth on this for the past three years. You get constant whiplash.

  • “You’re building an evals org? Isn’t that just doing the homework for the labs? And besides, none of it will scale to ASI. You’ll just safety-wash labs along the way.”

  • “You’re doing jailbreaks? This will just get solved at some point, and, honestly, it doesn’t tackle the core of the alignment problem.”

  • “Automated research? Isn’t that just capabilities? You claim that you’ll focus on non-frontier capability things, but incentives are a powerful thing.”

As someone in the AI safety space who is certainly concerned about the risks (takeover and human disempowerment) while also feeling like I’d have a good personal fit in building a for-profit, deciding on what to do has not been easy. You get completely different responses from different people in the community as to whether Startup X is net positive (and sometimes the same person will give you the opposite opinion a month later). At some point, you either:

  • Take the safe option by not building a startup (so continue to do independent research, work at a research org, or create a non-profit).

  • Build a startup that doesn’t tackle important parts of AI safety (because it’s not bad per se, and much easier to make money).

    • Outside of cybersecurity and evals, it’s not particularly easy to find domains with a big enough market size while tackling important AI safety work.

  • Build the controversial startup after going with one’s gut (feeling that this is overall a net positive bet).

In my case, I started going in the startup direction because I wanted to force myself to think creatively about what a net positive startup for AI safety would look like and hopefully bring more money into the space (to hire AI safety researchers). I thought, “AI safety conscious people will tend to give up too easily while trying to conceive of a net positive alignment startup,” so I hoped I could land on something that would overall be net positive towards superintelligence alignment and other risks.

We’ve been moving in the non-profit direction in the past two months, but we are reflecting on whether this is ideal (e.g. could we make it work as a startup while bringing more money to AI safety and reducing risks?). In our case (Coordinal Research), we are doing something that involves “automating (safety) research” so it’s an area we’d want to be especially careful about and would like to engage with the AI safety community throughout the process so that we can be thoughtful about how we approach things (or decide not to pursue specific directions/​products/​services).

I’d love to hear what others in the community think about this[1]. For example:

  • If a startup causes slightly shorter timelines, but directly addresses core problems of superalignment, is that still net negative? Could they argue for lowering P(doom) overall, even if they might cause some acceleration in timelines?

  • What type of output/​product is likely to be net negative no matter what? Environments for training end-to-end RL? LLM scaffold companies? Evals on automated AI R&D?

  • What are the top positive examples and cautionary tales?

  • What kinds of AI startups would you like to see?

  • Thoughts on organizational structure?

It’s unlikely we’ll get complete agreement on many of these questions since people in the space sometimes have opposite opinions on what is good or bad from an AI safety perspective, but I’m looking forward to hearing the community’s thoughts on how to navigate these trade-offs in AI safety entrepreneurship.

  1. ^