This is a sentiment that I’ve heard often when discussing AI safety for-profits; that they often abandon their original mission in favour of a neutral or actively harmful objective. I am not aware of enough examples to treat this as an established pattern.
Anthropic is the strongest example, starting explicitly as a safety company and now clearly accelerating the frontier. However, they have used their position to recently advocate for restricted use of autonomous weapons and a curtailment of mass surveillance. Anthropic shows that this kind of influence is possible (although whether they have been net positive is a more complicated question). I am not aware of enough examples of AI safety for profit startups being derailed to agree that this is a structural flaw we should treat as disqualifying.
By some estimates the ratio of AI safety to capabilities funding is roughly 1:250. In this kind of situation, it seems less important to find approaches which are likely not net negative (such as many AI safety research orgs) and more important to find approaches which could be strongly net positive. As I outlined, I think that poor alignment exists in research and advocacy orgs as well and that for-profits are not uniquely pre-disposed to harmful effects. I believe that there is a significant cost to being so risk-averse about a whole category of intervention.
I’m curious about your thoughts on:
Evidence for frequent failures of AI safety startups. Many people I have spoken to believe this, although I don’t see the examples I would expect to. (Depending on your views of frontier labs, these may be a small number of failures which nevertheless give the for-profit structure a strongly net negative impact. However, this specific accelerationist failure mode might be unique to the lab structure rather than companies in aggregate?)
How these ideas translate to research and advocacy orgs. What bar for integrity and clarity of plan should we demand from them? Non-profits have their own misalignment pressures, and I’m not sure the bar applied to them is equally demanding.
This is a sentiment that I’ve heard often when discussing AI safety for-profits; that they often abandon their original mission in favour of a neutral or actively harmful objective. I am not aware of enough examples to treat this as an established pattern.
Anthropic is the strongest example, starting explicitly as a safety company and now clearly accelerating the frontier. However, they have used their position to recently advocate for restricted use of autonomous weapons and a curtailment of mass surveillance. Anthropic shows that this kind of influence is possible (although whether they have been net positive is a more complicated question). I am not aware of enough examples of AI safety for profit startups being derailed to agree that this is a structural flaw we should treat as disqualifying.
By some estimates the ratio of AI safety to capabilities funding is roughly 1:250. In this kind of situation, it seems less important to find approaches which are likely not net negative (such as many AI safety research orgs) and more important to find approaches which could be strongly net positive. As I outlined, I think that poor alignment exists in research and advocacy orgs as well and that for-profits are not uniquely pre-disposed to harmful effects. I believe that there is a significant cost to being so risk-averse about a whole category of intervention.
I’m curious about your thoughts on:
Evidence for frequent failures of AI safety startups. Many people I have spoken to believe this, although I don’t see the examples I would expect to. (Depending on your views of frontier labs, these may be a small number of failures which nevertheless give the for-profit structure a strongly net negative impact. However, this specific accelerationist failure mode might be unique to the lab structure rather than companies in aggregate?)
How these ideas translate to research and advocacy orgs. What bar for integrity and clarity of plan should we demand from them? Non-profits have their own misalignment pressures, and I’m not sure the bar applied to them is equally demanding.