I think there are some conditional and unconditional probabilities that are worth estimating to distinguish between disagreements about:
the technical nature of intelligence and alignment, as technical problems
how the future is likely to play out, given geopolitical considerations and models of how people and organizations with power are likely to act, effectiveness of AI governance efforts, etc.
The conditional probability is risk of extinction / takeover / disempowerment, given that AI governance is non-existent or mostly ineffective. In notation, p(doom | no societal shift / effective intervention).
My estimate (and I think the estimate of many other doom-ier people) is that this probability is very high (95%+) - it is basically overdetermined that things will go pretty badly, absent radical societal change. This estimate is based on an intuition that the technical problem of building TAI seems to be somewhat easier than the technical problem of aligning that intelligence.
The next probability estimate is p(relevant actors / government / society etc. react correctly to avert the default outcome).
This probability feels much harder to estimate and shifts around much more, because it involves modeling human behavior on a global scale, in the face of uncertain future events. Worldwide reaction to COVID was a pretty big negative update; the developing AI race dynamics between big organizations is another negative update; some of the reactions to the FLI letter and the TIME article are small positive updates.
This probability also depends on the nature of the technical problem—for example, if aligning a superintelligence is harder than building one, but not much harder, then the size and precision of the intervention needed for a non-default outcome is probably a lot smaller.
Overall, I’m not optimistic about this probability, but like many, I’m hesitant to put down a firm number. I think that in worlds where things do not go badly though, lots of things look pretty radically different than they do now. And we don’t seem to be on track for that to happen.
(You can then get an unconditional p(doom) by multiplying these two probabilities (or the appropriate complements) together. But since my estimate of the first conditional probability is very close to 1, shifts and updates come entirely from the fuzzy / shifty second probability.)
I’m somewhat optimistic that AI takeover might not happen (or might be very easy to avoid) even given no policy interventions whatsoever, i.e. that the problem is easily enough addressed that it can be done by firms in the interests of making a good product and/or based on even a modest amount of concern from their employees and leadership. Perhaps I’d give a 50% chance of takeover with no policy effort whatsoever to avoid it, compared to my 22% chance of takeover with realistic efforts to avoid it.
I think it’s pretty hard to talk about “no policy effort whatsoever,” or to distinguish voluntary measures from government regulation, or so on. So it’s not totally clear what the “conditioned on no intervention” number means and I think that’s actually a pretty serious ambiguity.
That said I do think my 50% vs your 95% points at a real disagreement—I feel like I have very little idea about how real a problem takeover will be, and have been so far unpersuaded by arguments that takeover is a very strong default. If you are confident that’s a real problem that will be hard to fix, it might be reasonable to just double my takeover probabilities to take that into account.
Actually I think my view is more like 50% from AI systems built by humans (compared to 15% unconditionally), if there is no effort to avoid takeover.
If you continue assuming “no effort to avoid takeover at all” into the indefinite future then I expect eventual takeover is quite likely, maybe more like 80-90% conditioned on nothing else going wrong, though in all these questions it really matters a lot what exactly you mean by “no effort” and it doesn’t seem like a fully coherent counterfactual.
To clarify, the conditional probability in the parent comment is not conditioned on no policy effort or intervention, it’s conditional on whatever policy / governance / voluntary measures are tried being insufficient or ineffective, given whatever the actual risk turns out to be.
If a small team hacking in secret for a few months can bootstrap to superintelligence using a few GPUs, the necessary level of policy and governance intervention is massive. If the technical problem has a somewhat different nature, then less radical interventions are plausibly sufficient.
I personally feel pretty confident that:
Eventually, and maybe pretty soon (within a few years), the nature of the problem will indeed be that it is plausible a small team can bootstrap to superintelligence in secret, without massive resources.
Such an intelligence will be dramatically harder to align than it is to build, and this difficulty will be non-obvious to many would-be builders.
And believe somewhat less confidently that:
The governance and policy interventions necessary to robustly avert doom given these technical assumptions are massive and draconian.
We are not on track to see such interventions put in place.
Given different views on the nature of the technical problem (the first two bullets), you can get a different level of intervention which you think is required for robust safety (the third bullet), and different estimate that such an intervention is put in place successfully (the fourth bullet).
I think it’s also useful to think about cases where policy interventions were (in hindsight) obviously not sufficient to prevent doom robustly, but by luck or miracle (or weird anthropics) we make it through anyway. My estimate of this probability is that it’s really low—on my model, we need a really big miracle, given actually-insufficient intervention. What “sufficient intervention” looks like, and how likely we are to get it, I find much harder to estimate.
Are you assuming that avoiding doom in this way will require a pivotal act? It seem absent policy intervention and societal change, even if some firms exhibit a proper amount of concern many others will not.
It’s unclear whether some people being cautious and some people being incautious leads to an AI takeover.
In this hypothetical, I’m including AI developers selling AI systems to law enforcement and militaries, which are used to enforce the law and win wars against competitors using AI. But I’m assuming we wouldn’t pass a bunch of new anti-AI laws (and that AI developers don’t become paramilitaries).
i.e. that the problem is easily enough addressed that it can be done by firms in the interests of making a good product and/or based on even a modest amount of concern from their employees and leadership
I’m curious about how contingent this prediction is on 1, timelines and 2, rate of alignment research progress. On 2, how much of your P(no takeover) comes from expectations about future research output from ARC specifically?
If tomorrow, all alignment researchers stopped working on alignment (and went to become professional tennis players or something) and no new alignment researchers arrived, how much more pessimistic would you become about AI takeover?
These predictions are not very related to any alignment research that is currently occurring. I think it’s just quite unclear how hard the problem is, e.g. does deceptive alignment occur, do models trained to honestly answer easy questions generalize to hard questions, how much intellectual work are AI systems doing before they can take over, etc.
I know people have spilled a lot of ink over this, but right now I don’t have much sympathy for confidence that the risk will be real and hard to fix (just as I don’t have much sympathy for confidence that the problem isn’t real or will be easy to fix).
I think there are some conditional and unconditional probabilities that are worth estimating to distinguish between disagreements about:
the technical nature of intelligence and alignment, as technical problems
how the future is likely to play out, given geopolitical considerations and models of how people and organizations with power are likely to act, effectiveness of AI governance efforts, etc.
The conditional probability is risk of extinction / takeover / disempowerment, given that AI governance is non-existent or mostly ineffective. In notation, p(doom | no societal shift / effective intervention).
My estimate (and I think the estimate of many other doom-ier people) is that this probability is very high (95%+) - it is basically overdetermined that things will go pretty badly, absent radical societal change. This estimate is based on an intuition that the technical problem of building TAI seems to be somewhat easier than the technical problem of aligning that intelligence.
The next probability estimate is p(relevant actors / government / society etc. react correctly to avert the default outcome).
This probability feels much harder to estimate and shifts around much more, because it involves modeling human behavior on a global scale, in the face of uncertain future events. Worldwide reaction to COVID was a pretty big negative update; the developing AI race dynamics between big organizations is another negative update; some of the reactions to the FLI letter and the TIME article are small positive updates.
This probability also depends on the nature of the technical problem—for example, if aligning a superintelligence is harder than building one, but not much harder, then the size and precision of the intervention needed for a non-default outcome is probably a lot smaller.
Overall, I’m not optimistic about this probability, but like many, I’m hesitant to put down a firm number. I think that in worlds where things do not go badly though, lots of things look pretty radically different than they do now. And we don’t seem to be on track for that to happen.
(You can then get an unconditional p(doom) by multiplying these two probabilities (or the appropriate complements) together. But since my estimate of the first conditional probability is very close to 1, shifts and updates come entirely from the fuzzy / shifty second probability.)
I’m somewhat optimistic that AI takeover might not happen (or might be very easy to avoid) even given no policy interventions whatsoever, i.e. that the problem is easily enough addressed that it can be done by firms in the interests of making a good product and/or based on even a modest amount of concern from their employees and leadership. Perhaps I’d give a 50% chance of takeover with no policy effort whatsoever to avoid it, compared to my 22% chance of takeover with realistic efforts to avoid it.
I think it’s pretty hard to talk about “no policy effort whatsoever,” or to distinguish voluntary measures from government regulation, or so on. So it’s not totally clear what the “conditioned on no intervention” number means and I think that’s actually a pretty serious ambiguity.
That said I do think my 50% vs your 95% points at a real disagreement—I feel like I have very little idea about how real a problem takeover will be, and have been so far unpersuaded by arguments that takeover is a very strong default. If you are confident that’s a real problem that will be hard to fix, it might be reasonable to just double my takeover probabilities to take that into account.
Actually I think my view is more like 50% from AI systems built by humans (compared to 15% unconditionally), if there is no effort to avoid takeover.
If you continue assuming “no effort to avoid takeover at all” into the indefinite future then I expect eventual takeover is quite likely, maybe more like 80-90% conditioned on nothing else going wrong, though in all these questions it really matters a lot what exactly you mean by “no effort” and it doesn’t seem like a fully coherent counterfactual.
To clarify, the conditional probability in the parent comment is not conditioned on no policy effort or intervention, it’s conditional on whatever policy / governance / voluntary measures are tried being insufficient or ineffective, given whatever the actual risk turns out to be.
If a small team hacking in secret for a few months can bootstrap to superintelligence using a few GPUs, the necessary level of policy and governance intervention is massive. If the technical problem has a somewhat different nature, then less radical interventions are plausibly sufficient.
I personally feel pretty confident that:
Eventually, and maybe pretty soon (within a few years), the nature of the problem will indeed be that it is plausible a small team can bootstrap to superintelligence in secret, without massive resources.
Such an intelligence will be dramatically harder to align than it is to build, and this difficulty will be non-obvious to many would-be builders.
And believe somewhat less confidently that:
The governance and policy interventions necessary to robustly avert doom given these technical assumptions are massive and draconian.
We are not on track to see such interventions put in place.
Given different views on the nature of the technical problem (the first two bullets), you can get a different level of intervention which you think is required for robust safety (the third bullet), and different estimate that such an intervention is put in place successfully (the fourth bullet).
I think it’s also useful to think about cases where policy interventions were (in hindsight) obviously not sufficient to prevent doom robustly, but by luck or miracle (or weird anthropics) we make it through anyway. My estimate of this probability is that it’s really low—on my model, we need a really big miracle, given actually-insufficient intervention. What “sufficient intervention” looks like, and how likely we are to get it, I find much harder to estimate.
Are you assuming that avoiding doom in this way will require a pivotal act? It seem absent policy intervention and societal change, even if some firms exhibit a proper amount of concern many others will not.
It’s unclear whether some people being cautious and some people being incautious leads to an AI takeover.
In this hypothetical, I’m including AI developers selling AI systems to law enforcement and militaries, which are used to enforce the law and win wars against competitors using AI. But I’m assuming we wouldn’t pass a bunch of new anti-AI laws (and that AI developers don’t become paramilitaries).
I’m curious about how contingent this prediction is on 1, timelines and 2, rate of alignment research progress. On 2, how much of your P(no takeover) comes from expectations about future research output from ARC specifically?
If tomorrow, all alignment researchers stopped working on alignment (and went to become professional tennis players or something) and no new alignment researchers arrived, how much more pessimistic would you become about AI takeover?
These predictions are not very related to any alignment research that is currently occurring. I think it’s just quite unclear how hard the problem is, e.g. does deceptive alignment occur, do models trained to honestly answer easy questions generalize to hard questions, how much intellectual work are AI systems doing before they can take over, etc.
I know people have spilled a lot of ink over this, but right now I don’t have much sympathy for confidence that the risk will be real and hard to fix (just as I don’t have much sympathy for confidence that the problem isn’t real or will be easy to fix).