First, there are only three to ten frontier AI developers, depending on how you count. So naively, the marginal risk from one developer should only be 3-10 times lower than the total risk.
I don’t think this holds. Suppose that there are three developers, and each of them is independently 80% likely to develop ASI up to the level where it could kill everyone. Each of them is 10% likely by that point to instill a sufficient degree of alignment such that it won’t actually do so. Each of them is roughly equally likely to achieve the deadly level of capability first, and the fate of the world depends upon whether that particular one is sufficiently well aligned—if so, it will sufficiently prevent more misaligned ASI arising in the future.
Despite its oversimplified nature, this is clearly a terrible scenario. There is a > 99% chance that ASI is developed, with an overall 89.28% chance of doom. If there were only two developers, then there would be 96% chance of ASI with a 86.4% chance of doom.
So the marginal risk of doom due to having 3 developers instead of 2 is only 2.88%, which is very much less than 1⁄3 of the total risk. The great majority of the marginal risk was in the 0 → 1 transition.
As you say, a major problem is that such a calculation of marginal risk ignores the possibility of coordination, and also ignores the increased likelihood of defection from having more parties involved.
Well, I did say “Naively”… but yes I agree the analysis was too naive, and I will edit the post. You make a good point that it can be improved by considering that harms from AI (especially large-scale ones like x-risk) are overdetermined when there are multiple developers. The naive analysis is more accurate when the risk is smaller.
As a side note, if the risk from a single project is so large, then the first project is probably disincentivized at the individual level (would you really want to take an 80% risk of extinction?), and it’s a “pure” coordination problem, like a stag hunt, rather than an incentive problem (like prisoner’s dilema).
Another way the “naive” calculation can be is wrong (which is the main one I had in mind) is if the risks of different projects are correlated, which they are, e.g. because they are all using similar technology.
I don’t think this holds. Suppose that there are three developers, and each of them is independently 80% likely to develop ASI up to the level where it could kill everyone. Each of them is 10% likely by that point to instill a sufficient degree of alignment such that it won’t actually do so. Each of them is roughly equally likely to achieve the deadly level of capability first, and the fate of the world depends upon whether that particular one is sufficiently well aligned—if so, it will sufficiently prevent more misaligned ASI arising in the future.
Despite its oversimplified nature, this is clearly a terrible scenario. There is a > 99% chance that ASI is developed, with an overall 89.28% chance of doom. If there were only two developers, then there would be 96% chance of ASI with a 86.4% chance of doom.
So the marginal risk of doom due to having 3 developers instead of 2 is only 2.88%, which is very much less than 1⁄3 of the total risk. The great majority of the marginal risk was in the 0 → 1 transition.
As you say, a major problem is that such a calculation of marginal risk ignores the possibility of coordination, and also ignores the increased likelihood of defection from having more parties involved.
Well, I did say “Naively”… but yes I agree the analysis was too naive, and I will edit the post. You make a good point that it can be improved by considering that harms from AI (especially large-scale ones like x-risk) are overdetermined when there are multiple developers. The naive analysis is more accurate when the risk is smaller.
As a side note, if the risk from a single project is so large, then the first project is probably disincentivized at the individual level (would you really want to take an 80% risk of extinction?), and it’s a “pure” coordination problem, like a stag hunt, rather than an incentive problem (like prisoner’s dilema).
Another way the “naive” calculation can be is wrong (which is the main one I had in mind) is if the risks of different projects are correlated, which they are, e.g. because they are all using similar technology.