I don’t know where the idea of “marginal risk” came from in AI policy. It sounds like BS. Yet another excuse to keep building dangerous AI systems…
The basic idea is that instead of looking at how likely your AI system is to lead to millions of deaths, you ask “given that other people are already building AI systems that might lead to millions of deaths, how much worse will I make things, if I build one more?” And then you only feel bad about that “marginal” contribution.
Does this exist in other areas? Imagine there are labs performing biological gain-of-function research (the kind that likely led to the COVID pandemic). Suppose your lab is doing a shit job of security and you estimate that every month, there’s a 1% chance that you cause another global pandemic. Now suppose that you learn that there is another lab that is doing an equally shit job of security. Does that make it OK, what you are doing? Obviously not.
So, when exactly are you allowed to say “The marginal risk that I might kill someone is small, because other people are also behaving recklessly. So I should be allowed to keep doing what I’m doing.” Not when they do things that actually kill people, generally!
So that’s the “Why aren’t we following accepted norms and standard practice?” objection. This applies to a lot of AI stuff, actually; see also: “evals” vs. engineering practices for other safety-critical technology—It’s a world of difference.
There’s also the (closely related) moral objection: since when is “everyone else is doing it” an OK excuse? I talked about this previously; the answer is, in fact, “sometimes”, but I’d be surprised to find “when you are putting everyone’s lives at risk” in most people’s “acceptable” bucket.
Another closely related objection is that it’s anti-cooperative. Obviously the thing we should be doing is coordinating to not build any AIs that pose unacceptable risks. This isn’t a critique of individual actors making decisions this way, but of the normalization of a policy choice that leaves us with these unacceptable risks. What do you call such a policy? Unacceptable.
But I have a few more practical objections to considering marginal risk that I suspect some people might find more compelling.
First, there are only three to ten frontier AI developers, depending on how you count. So naively, the marginal risk from one developer should only be 3-10 times lower than the total risk. I don’t think a 3-10x reduction in the probability of human extinction, for instance, is likely to bring the current risk levels from “unacceptable” to “acceptable”. In practical terms, building one more frontier AI system gives us one more chance to mess it up and suffer the consequences. That’s non-trivial!
But also: Marginal risk can be reduced at every step of the way by moving at the same pace, but taking smaller steps. An increase in marginal risk that might seem too large can be reduced simply by inching towards it. If you were doing this unilaterally and didn’t have anybody else to compare it to, we could instead say you’re allowed a certain rate of increasing the marginal risk. But I haven’t seen anybody propose that.
Also, when we have N different AI developers, the effective rate of marginal risk increases overall gets multiplied by a factor of N. Let 10 companies increase the marginal risk by 1% each in a month, and you get a 10% increase. It’s basically a recipe for an incremental race-to-the-bottom.
Conclusion
Thinking “on the margin” is sometimes useful and appropriate, but there are good reasons we don’t usually do it for harming others, or risking harm. The way I’ve seen this phrase used in AI discourse is “reasonableness-washing” something that is actually quite silly. Marginal risk gives us baby steps towards catastrophe.
Thanks for reading The Real AI! Subscribe for free to receive new posts and support my work.
I’m not defending the argument for Anthropic to exist, but you are entirely missing it here. I don’t think anyone is justifying work on AGI using the notion of marginal risk you’re attacking. They are doing it because they think their marginal contribution is improving the odds of a good future.
I’m not going after particular people’s justifications for their work; I’m going after the institutionalization of “marginal risk” as a relevant concept and the way it justifies unacceptable risk-taking.
I don’t think this holds. Suppose that there are three developers, and each of them is independently 80% likely to develop ASI up to the level where it could kill everyone. Each of them is 10% likely by that point to instill a sufficient degree of alignment such that it won’t actually do so. Each of them is roughly equally likely to achieve the deadly level of capability first, and the fate of the world depends upon whether that particular one is sufficiently well aligned—if so, it will sufficiently prevent more misaligned ASI arising in the future.
Despite its oversimplified nature, this is clearly a terrible scenario. There is a > 99% chance that ASI is developed, with an overall 89.28% chance of doom. If there were only two developers, then there would be 96% chance of ASI with a 86.4% chance of doom.
So the marginal risk of doom due to having 3 developers instead of 2 is only 2.88%, which is very much less than 1⁄3 of the total risk. The great majority of the marginal risk was in the 0 → 1 transition.
As you say, a major problem is that such a calculation of marginal risk ignores the possibility of coordination, and also ignores the increased likelihood of defection from having more parties involved.
Well, I did say “Naively”… but yes I agree the analysis was too naive, and I will edit the post. You make a good point that it can be improved by considering that harms from AI (especially large-scale ones like x-risk) are overdetermined when there are multiple developers. The naive analysis is more accurate when the risk is smaller.
As a side note, if the risk from a single project is so large, then the first project is probably disincentivized at the individual level (would you really want to take an 80% risk of extinction?), and it’s a “pure” coordination problem, like a stag hunt, rather than an incentive problem (like prisoner’s dilema).
Another way the “naive” calculation can be is wrong (which is the main one I had in mind) is if the risks of different projects are correlated, which they are, e.g. because they are all using similar technology.