What Makes an AI Startup “Net Positive” for Safety?
In light of the recent news from Mechanize/Epoch and the community discussion it sparked, I’d like to open a conversation about a question some of us grapple with: What constitutes a net-positive AI startup from an AI safety perspective, and what steps can founders take to demonstrate goodwill and navigate the inherent pressures?
This feels like an important conversation to have because AI safety seems to have been increasingly encouraging people in the community to build startups (due to a lack of funding, potentially higher ceiling for society-wide impact, etc.). I’ve thought a lot about this, and have been going back and forth on this for the past three years. You get constant whiplash.
“You’re building an evals org? Isn’t that just doing the homework for the labs? And besides, none of it will scale to ASI. You’ll just safety-wash labs along the way.”
“You’re doing jailbreaks? This will just get solved at some point, and, honestly, it doesn’t tackle the core of the alignment problem.”
“Automated research? Isn’t that just capabilities? You claim that you’ll focus on non-frontier capability things, but incentives are a powerful thing.”
As someone in the AI safety space who is certainly concerned about the risks (takeover and human disempowerment) while also feeling like I’d have a good personal fit in building a for-profit, deciding on what to do has not been easy. You get completely different responses from different people in the community as to whether Startup X is net positive (and sometimes the same person will give you the opposite opinion a month later). At some point, you either:
Take the safe option by not building a startup (so continue to do independent research, work at a research org, or create a non-profit).
Build a startup that doesn’t tackle important parts of AI safety (because it’s not bad per se, and much easier to make money).
Outside of cybersecurity and evals, it’s not particularly easy to find domains with a big enough market size while tackling important AI safety work.
Build the controversial startup after going with one’s gut (feeling that this is overall a net positive bet).
In my case, I started going in the startup direction because I wanted to force myself to think creatively about what a net positive startup for AI safety would look like and hopefully bring more money into the space (to hire AI safety researchers). I thought, “AI safety conscious people will tend to give up too easily while trying to conceive of a net positive alignment startup,” so I hoped I could land on something that would overall be net positive towards superintelligence alignment and other risks.
We’ve been moving in the non-profit direction in the past two months, but we are reflecting on whether this is ideal (e.g. could we make it work as a startup while bringing more money to AI safety and reducing risks?). In our case (Coordinal Research), we are doing something that involves “automating (safety) research” so it’s an area we’d want to be especially careful about and would like to engage with the AI safety community throughout the process so that we can be thoughtful about how we approach things (or decide not to pursue specific directions/products/services).
I’d love to hear what others in the community think about this[1]. For example:
If a startup causes slightly shorter timelines, but directly addresses core problems of superalignment, is that still net negative? Could they argue for lowering P(doom) overall, even if they might cause some acceleration in timelines?
What type of output/product is likely to be net negative no matter what? Environments for training end-to-end RL? LLM scaffold companies? Evals on automated AI R&D?
What are the top positive examples and cautionary tales?
What kinds of AI startups would you like to see?
Thoughts on organizational structure?
It’s unlikely we’ll get complete agreement on many of these questions since people in the space sometimes have opposite opinions on what is good or bad from an AI safety perspective, but I’m looking forward to hearing the community’s thoughts on how to navigate these trade-offs in AI safety entrepreneurship.
- ^
Here are some related posts about this:
I think almost all startups are really great! I think there really is a very small set of startups that end up harmful for the world, usually by specifically making a leveraged bet on trying to create very powerful AGI, or accelerating AI-driven R&D.
Because in some sense expecting a future with both of these technologies is what distinguishes our community from the rest of the world, if you end up optimizing for profit, leaning into exactly those technology then ends up a surprisingly common thing for people to do (as its where the alpha of our community relative to the rest of the world lies), which I do think is really bad.
As a concrete example, I don’t think Elicit is making the world much worse. I think its sign is not super obvious, but I don’t have a feeling they are accelerating timelines very much, or are making takeoff more sharp, or are exhausting political coordination goodwill. Similarly Midjourney I think is good for the world, so is Suno, so are basically all the AI art startups. I do think they might drive investment into the AI sector, and that might draw them into the red, but in as much as we want to have norms, and I give people advice on what to do, working on those things feels like it could definitely be net good.
I think you’re kind of avoiding the question. What startups are really great for AI safety?
I don’t think we should have norms or a culture that requires everything everyone does to be good specifically for AI Safety. Startups are mostly good because they produce economic value and solve problems for people. Sometimes they do so in a way that helps with AI Safety, sometimes not. I think Suno has helped with AI Safety because it has allowed us to make a dope album that made the rationalist culture better. Midjourney has helped us make LW better. But mostly, they just make the world better the same way most companies in history have made the world better.
Well yeah, but the question here is “what should be community guidelines on specifically how to approach startups that are aimed at specifically helping with AI safety” (which may or may not include AI), not “what kinds of AI startups should people start, if any?”
Yeah, thanks! I agree with @habryka’s comment, though I’m a little worried it may shut down conversation since it might make people think the conversation is about AI startups in general and less about AI startups in service of AI safety. This is because people might consider the debate/question answered after agreeing with the top comment.
That said, I do hear the “any AI startup is bad because it increases AI investment and therefore reduces timelines” so I think it’s worth at least getting more clarity on this.
Ah, I see. I did interpret the framing around “net positive” to be largely around normal companies. It’s IMO relatively easy to be net-positive, since all you need to do is to avoid harm in expectation and help in any way whatsoever, which my guess is almost any technology startup that doesn’t accelerate things, but has reasonable people at the helm, can achieve.
When we are talking more about “how to make a safety-focused company that is substantially positive on safety?”, that is a very different question in my mind.
Nod, but fwiw if you don’t have a cached answer, I am interested in you spending like 15 minutes thinking through whether there exist startup-centric approaches to helping with x-risk that are good.
Anything that improves interpretability seems like a no brainer.
With the caveat that it has to actually improve interpretability in a provable fashion, and not just make us think it did, like we saw with CoTR.
This seems like a fairly important premise to your position but I don’t think it’s true. Many safety-conscious people have started for-profit companies. As far as I can tell, every single one of those companies has been net negative. Safety-conscious people are starting too many companies, not too few.
I’m not confident that there are no net-positive AI startup ideas. But I’m confident that for the median randomly-chosen idea that someone thinks is net positive, it’s actually net negative.
I think the statement in the parent comment is too general. What I should have said is that every generalist frontier AI company has been net negative. Narrow AI companies that provide useful services and have ~zero chance of accelerating AGI are probably net positive.
Got it, I was going to mention that Haize Labs and Gray Swan AI seem to be doing great work in improving jailbreak robustness.
I won’t comment on your specific startup, but I wonder in general how an AI Safety startup becomes a successful business. What’s the business model? Who is the target customer? Why do they buy? Unless the goal is to get acquired by one of the big labs, in which case, sure, but again, why or when do they buy, and at what price? Especially since they already don’t seem to be putting much effort into solving the problem themselves despite having better tools and more money to do so than any new entrant startup.
Some quick thoughts on which types of companies are net positive, negative or neutral:
Net positive:
Focuses on interpretability, alignment, evals, security (e.g. Goodfire, Gray Swan, Conjecture, Deep Eval).
Net negative:
Directly intends to build AGI without a significant commitment to invest in safety (e.g. Keene Technologies).
Shortens timelines or worsens race dynamics without any other upside to compensate for that.
Neutral / debatable:
Applies AI capabilities to a specific problem without generally increasing capabilities (e.g. Midjourney, Suno, Replit, Cursor).
Keeps up with the capabilities frontier while also having a strong safety culture and investing substantially in alignment (e.g. Anthropic).
A startup could disincentivize itself from becoming worth more than a billion dollars by selling an option to buy it for a billion dollars.
Not to say it’s impossible, but OpenAI tried to do this with the profit cap and it didn’t work.
They did the opposite, incentivizing themselves to reach the profit cap. I’m talking about making sure that any net worth beyond a billion goes to someone else.
I didn’t track this previously: how did they incentive themselves to reach the cap?
If your investors only get paid up to 100x their investment, you want to go for strategies that return much more than 100x if they work.
The value of the startup is only loosely correlated with being positive for AI safety (capabilities are valuable, but they’re not the only valuable thing). Ideally the startup would be worth billions if and only if AI safety was solved.
Isn’t it logically impossible to determine this, without there being a widely accepted definition of “Artifical Intelligence” in the first place? (accepted by all relevant parties)
Suprisingly 6 different users downvoted a straightforward comment without explanation… so I imagine that also indicates a bot problem which has to be factored in, (or at least users replicating bot behavior)
I downvoted because it seems obviously wrong and irrelevant to me.
There clearly are bots capable of typing this out… unless you’ve never browsed through twitter/X before? (which seems very implausible considering your >6y old account)
A meaningful non bot explanation would contain some reason(s) for the opinion. I don’t think you need me to enumerate but I’ll give you the benefit of the doubt: Usually provided in the form of a logical argument, linked sources, comparison, etc…