I’m somewhat optimistic that AI takeover might not happen (or might be very easy to avoid) even given no policy interventions whatsoever, i.e. that the problem is easily enough addressed that it can be done by firms in the interests of making a good product and/or based on even a modest amount of concern from their employees and leadership. Perhaps I’d give a 50% chance of takeover with no policy effort whatsoever to avoid it, compared to my 22% chance of takeover with realistic efforts to avoid it.
I think it’s pretty hard to talk about “no policy effort whatsoever,” or to distinguish voluntary measures from government regulation, or so on. So it’s not totally clear what the “conditioned on no intervention” number means and I think that’s actually a pretty serious ambiguity.
That said I do think my 50% vs your 95% points at a real disagreement—I feel like I have very little idea about how real a problem takeover will be, and have been so far unpersuaded by arguments that takeover is a very strong default. If you are confident that’s a real problem that will be hard to fix, it might be reasonable to just double my takeover probabilities to take that into account.
Actually I think my view is more like 50% from AI systems built by humans (compared to 15% unconditionally), if there is no effort to avoid takeover.
If you continue assuming “no effort to avoid takeover at all” into the indefinite future then I expect eventual takeover is quite likely, maybe more like 80-90% conditioned on nothing else going wrong, though in all these questions it really matters a lot what exactly you mean by “no effort” and it doesn’t seem like a fully coherent counterfactual.
To clarify, the conditional probability in the parent comment is not conditioned on no policy effort or intervention, it’s conditional on whatever policy / governance / voluntary measures are tried being insufficient or ineffective, given whatever the actual risk turns out to be.
If a small team hacking in secret for a few months can bootstrap to superintelligence using a few GPUs, the necessary level of policy and governance intervention is massive. If the technical problem has a somewhat different nature, then less radical interventions are plausibly sufficient.
I personally feel pretty confident that:
Eventually, and maybe pretty soon (within a few years), the nature of the problem will indeed be that it is plausible a small team can bootstrap to superintelligence in secret, without massive resources.
Such an intelligence will be dramatically harder to align than it is to build, and this difficulty will be non-obvious to many would-be builders.
And believe somewhat less confidently that:
The governance and policy interventions necessary to robustly avert doom given these technical assumptions are massive and draconian.
We are not on track to see such interventions put in place.
Given different views on the nature of the technical problem (the first two bullets), you can get a different level of intervention which you think is required for robust safety (the third bullet), and different estimate that such an intervention is put in place successfully (the fourth bullet).
I think it’s also useful to think about cases where policy interventions were (in hindsight) obviously not sufficient to prevent doom robustly, but by luck or miracle (or weird anthropics) we make it through anyway. My estimate of this probability is that it’s really low—on my model, we need a really big miracle, given actually-insufficient intervention. What “sufficient intervention” looks like, and how likely we are to get it, I find much harder to estimate.
Are you assuming that avoiding doom in this way will require a pivotal act? It seem absent policy intervention and societal change, even if some firms exhibit a proper amount of concern many others will not.
It’s unclear whether some people being cautious and some people being incautious leads to an AI takeover.
In this hypothetical, I’m including AI developers selling AI systems to law enforcement and militaries, which are used to enforce the law and win wars against competitors using AI. But I’m assuming we wouldn’t pass a bunch of new anti-AI laws (and that AI developers don’t become paramilitaries).
i.e. that the problem is easily enough addressed that it can be done by firms in the interests of making a good product and/or based on even a modest amount of concern from their employees and leadership
I’m curious about how contingent this prediction is on 1, timelines and 2, rate of alignment research progress. On 2, how much of your P(no takeover) comes from expectations about future research output from ARC specifically?
If tomorrow, all alignment researchers stopped working on alignment (and went to become professional tennis players or something) and no new alignment researchers arrived, how much more pessimistic would you become about AI takeover?
These predictions are not very related to any alignment research that is currently occurring. I think it’s just quite unclear how hard the problem is, e.g. does deceptive alignment occur, do models trained to honestly answer easy questions generalize to hard questions, how much intellectual work are AI systems doing before they can take over, etc.
I know people have spilled a lot of ink over this, but right now I don’t have much sympathy for confidence that the risk will be real and hard to fix (just as I don’t have much sympathy for confidence that the problem isn’t real or will be easy to fix).
I’m somewhat optimistic that AI takeover might not happen (or might be very easy to avoid) even given no policy interventions whatsoever, i.e. that the problem is easily enough addressed that it can be done by firms in the interests of making a good product and/or based on even a modest amount of concern from their employees and leadership. Perhaps I’d give a 50% chance of takeover with no policy effort whatsoever to avoid it, compared to my 22% chance of takeover with realistic efforts to avoid it.
I think it’s pretty hard to talk about “no policy effort whatsoever,” or to distinguish voluntary measures from government regulation, or so on. So it’s not totally clear what the “conditioned on no intervention” number means and I think that’s actually a pretty serious ambiguity.
That said I do think my 50% vs your 95% points at a real disagreement—I feel like I have very little idea about how real a problem takeover will be, and have been so far unpersuaded by arguments that takeover is a very strong default. If you are confident that’s a real problem that will be hard to fix, it might be reasonable to just double my takeover probabilities to take that into account.
Actually I think my view is more like 50% from AI systems built by humans (compared to 15% unconditionally), if there is no effort to avoid takeover.
If you continue assuming “no effort to avoid takeover at all” into the indefinite future then I expect eventual takeover is quite likely, maybe more like 80-90% conditioned on nothing else going wrong, though in all these questions it really matters a lot what exactly you mean by “no effort” and it doesn’t seem like a fully coherent counterfactual.
To clarify, the conditional probability in the parent comment is not conditioned on no policy effort or intervention, it’s conditional on whatever policy / governance / voluntary measures are tried being insufficient or ineffective, given whatever the actual risk turns out to be.
If a small team hacking in secret for a few months can bootstrap to superintelligence using a few GPUs, the necessary level of policy and governance intervention is massive. If the technical problem has a somewhat different nature, then less radical interventions are plausibly sufficient.
I personally feel pretty confident that:
Eventually, and maybe pretty soon (within a few years), the nature of the problem will indeed be that it is plausible a small team can bootstrap to superintelligence in secret, without massive resources.
Such an intelligence will be dramatically harder to align than it is to build, and this difficulty will be non-obvious to many would-be builders.
And believe somewhat less confidently that:
The governance and policy interventions necessary to robustly avert doom given these technical assumptions are massive and draconian.
We are not on track to see such interventions put in place.
Given different views on the nature of the technical problem (the first two bullets), you can get a different level of intervention which you think is required for robust safety (the third bullet), and different estimate that such an intervention is put in place successfully (the fourth bullet).
I think it’s also useful to think about cases where policy interventions were (in hindsight) obviously not sufficient to prevent doom robustly, but by luck or miracle (or weird anthropics) we make it through anyway. My estimate of this probability is that it’s really low—on my model, we need a really big miracle, given actually-insufficient intervention. What “sufficient intervention” looks like, and how likely we are to get it, I find much harder to estimate.
Are you assuming that avoiding doom in this way will require a pivotal act? It seem absent policy intervention and societal change, even if some firms exhibit a proper amount of concern many others will not.
It’s unclear whether some people being cautious and some people being incautious leads to an AI takeover.
In this hypothetical, I’m including AI developers selling AI systems to law enforcement and militaries, which are used to enforce the law and win wars against competitors using AI. But I’m assuming we wouldn’t pass a bunch of new anti-AI laws (and that AI developers don’t become paramilitaries).
I’m curious about how contingent this prediction is on 1, timelines and 2, rate of alignment research progress. On 2, how much of your P(no takeover) comes from expectations about future research output from ARC specifically?
If tomorrow, all alignment researchers stopped working on alignment (and went to become professional tennis players or something) and no new alignment researchers arrived, how much more pessimistic would you become about AI takeover?
These predictions are not very related to any alignment research that is currently occurring. I think it’s just quite unclear how hard the problem is, e.g. does deceptive alignment occur, do models trained to honestly answer easy questions generalize to hard questions, how much intellectual work are AI systems doing before they can take over, etc.
I know people have spilled a lot of ink over this, but right now I don’t have much sympathy for confidence that the risk will be real and hard to fix (just as I don’t have much sympathy for confidence that the problem isn’t real or will be easy to fix).