So far, AI safety companies have a very bad track record of sticking to their guns on safety. They almost inevitably devolve into making things worse. Not to say it’s impossible for a good AI safety startup to exist, but I would be EXTREMELY wary of any startup in the space, and they would need to clear a very high bar to demonstrate their integrity and why their “AI safety for profit” plan is going to work when it failed so many times before.
This is a sentiment that I’ve heard often when discussing AI safety for-profits; that they often abandon their original mission in favour of a neutral or actively harmful objective. I am not aware of enough examples to treat this as an established pattern.
Anthropic is the strongest example, starting explicitly as a safety company and now clearly accelerating the frontier. However, they have used their position to recently advocate for restricted use of autonomous weapons and a curtailment of mass surveillance. Anthropic shows that this kind of influence is possible (although whether they have been net positive is a more complicated question). I am not aware of enough examples of AI safety for profit startups being derailed to agree that this is a structural flaw we should treat as disqualifying.
By some estimates the ratio of AI safety to capabilities funding is roughly 1:250. In this kind of situation, it seems less important to find approaches which are likely not net negative (such as many AI safety research orgs) and more important to find approaches which could be strongly net positive. As I outlined, I think that poor alignment exists in research and advocacy orgs as well and that for-profits are not uniquely pre-disposed to harmful effects. I believe that there is a significant cost to being so risk-averse about a whole category of intervention.
I’m curious about your thoughts on:
Evidence for frequent failures of AI safety startups. Many people I have spoken to believe this, although I don’t see the examples I would expect to. (Depending on your views of frontier labs, these may be a small number of failures which nevertheless give the for-profit structure a strongly net negative impact. However, this specific accelerationist failure mode might be unique to the lab structure rather than companies in aggregate?)
How these ideas translate to research and advocacy orgs. What bar for integrity and clarity of plan should we demand from them? Non-profits have their own misalignment pressures, and I’m not sure the bar applied to them is equally demanding.
So far, AI safety companies have a very bad track record of sticking to their guns on safety. They almost inevitably devolve into making things worse. Not to say it’s impossible for a good AI safety startup to exist, but I would be EXTREMELY wary of any startup in the space, and they would need to clear a very high bar to demonstrate their integrity and why their “AI safety for profit” plan is going to work when it failed so many times before.
This is a sentiment that I’ve heard often when discussing AI safety for-profits; that they often abandon their original mission in favour of a neutral or actively harmful objective. I am not aware of enough examples to treat this as an established pattern.
Anthropic is the strongest example, starting explicitly as a safety company and now clearly accelerating the frontier. However, they have used their position to recently advocate for restricted use of autonomous weapons and a curtailment of mass surveillance. Anthropic shows that this kind of influence is possible (although whether they have been net positive is a more complicated question). I am not aware of enough examples of AI safety for profit startups being derailed to agree that this is a structural flaw we should treat as disqualifying.
By some estimates the ratio of AI safety to capabilities funding is roughly 1:250. In this kind of situation, it seems less important to find approaches which are likely not net negative (such as many AI safety research orgs) and more important to find approaches which could be strongly net positive. As I outlined, I think that poor alignment exists in research and advocacy orgs as well and that for-profits are not uniquely pre-disposed to harmful effects. I believe that there is a significant cost to being so risk-averse about a whole category of intervention.
I’m curious about your thoughts on:
Evidence for frequent failures of AI safety startups. Many people I have spoken to believe this, although I don’t see the examples I would expect to. (Depending on your views of frontier labs, these may be a small number of failures which nevertheless give the for-profit structure a strongly net negative impact. However, this specific accelerationist failure mode might be unique to the lab structure rather than companies in aggregate?)
How these ideas translate to research and advocacy orgs. What bar for integrity and clarity of plan should we demand from them? Non-profits have their own misalignment pressures, and I’m not sure the bar applied to them is equally demanding.