I like Wentworth’s toy model, but I want it to have more numbers, so I made some up. This leads me to the opposite conclusion to Wentworth.
I think (2-20%) is pretty sensible for successful intentional scheming of early AGI.
Assume the Phase One Risk is 10%.
Superintelligence is extremely dangerous by (strong) default. It will kill us or at least permanently disempower us, with high probability, unless we solve some technical alignment problems before building it.
Assume the Phase Two Risk is 99%. Also:
Spending an extra billion dollars on AI control reduces Phase One Risk from 10% to 5%.
Spending an extra billion dollars on AI alignment reduces Phase Two Risk from 99% to 98%.
The justification for these numbers is that each billion dollars buys us a “dignity point” aka +1 log-odds of survival. This assumes that both research fields are similarly neglected and tractable.
Therefore:
Baseline: by default we get 9 milli-lightcones.
If we spend on AI control we get 9.5 milli-lightcones. +0.5 over baseline.
If we spend on AI alignment we get 18 milli-lightcones, +9 over baseline.
We should therefore spend billions of dollars on both AI control and AI alignment, they are both very cost-efficient. This conclusion is robust to many different assumptions, provided that overall P(Doom) < 100%. So this model is not really a “case against AI control research”.
I like Wentworth’s toy model, but I want it to have more numbers, so I made some up. This leads me to the opposite conclusion to Wentworth.
Assume the Phase One Risk is 10%.
Assume the Phase Two Risk is 99%. Also:
Spending an extra billion dollars on AI control reduces Phase One Risk from 10% to 5%.
Spending an extra billion dollars on AI alignment reduces Phase Two Risk from 99% to 98%.
The justification for these numbers is that each billion dollars buys us a “dignity point” aka +1 log-odds of survival. This assumes that both research fields are similarly neglected and tractable.
Therefore:
Baseline: by default we get 9 milli-lightcones.
If we spend on AI control we get 9.5 milli-lightcones. +0.5 over baseline.
If we spend on AI alignment we get 18 milli-lightcones, +9 over baseline.
We should therefore spend billions of dollars on both AI control and AI alignment, they are both very cost-efficient. This conclusion is robust to many different assumptions, provided that overall P(Doom) < 100%. So this model is not really a “case against AI control research”.