So, I agree p(doom) has a ton of problems. I’ve really disliked it for a while. I also really dislike the way it tends towards explicitly endorsed evaporative cooling, in both directions; i.e., if your p(doom) is too [high / low] then someone with a [low / high] p(doom) will often say the correct thing to do is to ignore you.
But I also think “What is the minimum necessary and sufficient policy that you think would prevent extinction?” also has a ton of problems that would also tend to make it pretty bad as a centerpiece of discourse, and not useful as a method of exchanging models of how the world works.
(I know this post does not really endorse this alternative; I’m noting, not disagreeing.)
So some problems:
Whose policy? A policy enforced by treaty at the UN? The policy of regulators in the US? An international treaty policy—enforced by which nations? A policy (in the sense of mapping from states to actions) that is magically transferred into the brains of the top 20 people at the top 20 labs across the globe? …a policy executed by OpenPhil??
Why a single necessary and sufficient policy? What if the most realistic way of helping everyone is several policies that are by themselves insufficient, but together sufficient? Doesn’t this focus us on dramatic actions unhelpfully, in the same way that a “pivotal act” arguably so focuses us?
The policy necessary to save us will—of course—be downstream of whatever model of AI world you have going on, so this question seems—like p(doom) -- to focus you on things that are downstream of whatever actually matters. It might be useful for coalition formation—which does seem now to be MIRI’s focus, so that’s maybe intentional—but it doesn’t seem useful for understand what’s really going on.
Why a single necessary and sufficient policy? What if the most realistic way of helping everyone is several policies that are by themselves insufficient, but together sufficient? Doesn’t this focus us on dramatic actions unhelpfully, in the same way that a “pivotal act” arguably so focuses us?
I agree the phrasing here is maybe bad, but I think its generally accepted that “X and Y” is a policy when “X” and “Y” are independently policies, so I would expect a set of policies which are together sufficient would be an appropriate answer.
IME, a good way to cut through thorny disagreements on values or beliefs is to discuss concrete policies. Example: a guy and I were arguing about the value of “free-speech” and getting nowhere. I then suggested the kind of mechanisms I’d like to see on social media. Suddenly, we were both on the same page and rapidly reached agreement on what to do. Robustly good policies/actions exist. So I’d bet that shifting discussion from “what is your P(doom)?” to “what are your preferred policies for x-risk?” would make for much more productive conversations.
(First, my background assumptions for this discussion: I fear AGI is reachable, the leap from AGI to ASI is short, and sufficiently robust ASI alignment is impossible in principle.)
Whose policy? A policy enforced by treaty at the UN? The policy of regulators in the US? An international treaty policy—enforced by which nations?
Given the assumptions above, and assuming AGI becomes imminent, then:
If AGI would require scaling multiple orders of magnitude above current frontier models, then I would say the minimum sufficient policy is a global, permanent halt, enforced by a joint China-US military treaty and tacit European cooperation. Imagine nuclear non-proliferation, but with less tolerance for rogue states.
If AGI is easy (say, if it can be adapted to existing models with a <$50 million training run, and the key insights are simple enough to fit in a few papers), then no policy may be sufficient, and humans may be doomed to an eventual loss of control.
Why a single necessary and sufficient policy? What if the most realistic way of helping everyone is several policies that are by themselves insufficient, but together sufficient?
Since my upstream model is “If we succeed in building AGI, then the road to ASI is short, and ASI very robustly causes loss of human control,” the core of my policy proposals is a permanent halt. How we would enforce a halt might be complicated or impossible. But at the end of the day, either someone builds it or nobody builds it. So the core policy is essentially binary.
The big challenges I see are:
There are a bunch of smart people who accurately understand that current LLMs fail in embarassing ways, and who believe that AGI is too far away to be a serious issue. These people mostly see warnings of AGI as supporting (in their view) corrupt Silicon Valley hucksters. To make these people care about safety, they would need to be convinced that AGI might arrive in the next 10-20 years, and to feel it.
The people who do believe that AGI is possible in the near future are frequently seduced by various imagined benfits. These benefits may be either short-sighted visions of economic empire, or near-messianic visions of utopia. To make these people care about safety, they would need to be convinced that humans risk losing control, and that SkyNet will not increase their quarterly revenue.
Out of the people who believe that AGI is both possible in the near future, and who think that it might be very dangerous, many of them hope that there is some robust way to control multiple ASIs indefinitely. Convincing these people to (for example) support a halt would require convincing them that (a) alignment is extremely difficult or impossible, and (b) that there is actually some real-world way to halt. Otherwise, they may very reasonably default to plans like, “Try for some rough-near term ‘alignment’, and hope for some very lucky rolls of the dice.”
The central challenge is that nothing like AGI or ASI has ever existed. And building consensus around even concrete things with clear scientific answers (e.g., cigarettes causing lung cancer) can be very difficult once incentives are involved. And we currently have low agreement on how AGI might turn out, for both good and bad reasons. Humans (very reasonably) fail to follow long chains of hypotheticals. It’s almost always a good heuristic.
So trying to optimize rhetorical strategies for multiple groups with very different basic opinions is difficult.
Seems reasonable but this feels like an object level answer to what I assumed was a more meta question. (Like, this answers what you would want in a policy, and I read 1a3orn’s question as why this question isn’t Typed in a clear and flexible enough way)
Yeah, that’s absolutely fair. I mostly gave my personal answers on the object level, and then I tried to generalize to the larger issue of why there’s no simple communication strategy here.
So, I agree p(doom) has a ton of problems. I’ve really disliked it for a while. I also really dislike the way it tends towards explicitly endorsed evaporative cooling, in both directions; i.e., if your p(doom) is too [high / low] then someone with a [low / high] p(doom) will often say the correct thing to do is to ignore you.
But I also think “What is the minimum necessary and sufficient policy that you think would prevent extinction?” also has a ton of problems that would also tend to make it pretty bad as a centerpiece of discourse, and not useful as a method of exchanging models of how the world works.
(I know this post does not really endorse this alternative; I’m noting, not disagreeing.)
So some problems:
Whose policy? A policy enforced by treaty at the UN? The policy of regulators in the US? An international treaty policy—enforced by which nations? A policy (in the sense of mapping from states to actions) that is magically transferred into the brains of the top 20 people at the top 20 labs across the globe? …a policy executed by OpenPhil??
Why a single necessary and sufficient policy? What if the most realistic way of helping everyone is several policies that are by themselves insufficient, but together sufficient? Doesn’t this focus us on dramatic actions unhelpfully, in the same way that a “pivotal act” arguably so focuses us?
The policy necessary to save us will—of course—be downstream of whatever model of AI world you have going on, so this question seems—like p(doom) -- to focus you on things that are downstream of whatever actually matters. It might be useful for coalition formation—which does seem now to be MIRI’s focus, so that’s maybe intentional—but it doesn’t seem useful for understand what’s really going on.
So yeah.
I agree the phrasing here is maybe bad, but I think its generally accepted that “X and Y” is a policy when “X” and “Y” are independently policies, so I would expect a set of policies which are together sufficient would be an appropriate answer.
IME, a good way to cut through thorny disagreements on values or beliefs is to discuss concrete policies. Example: a guy and I were arguing about the value of “free-speech” and getting nowhere. I then suggested the kind of mechanisms I’d like to see on social media. Suddenly, we were both on the same page and rapidly reached agreement on what to do. Robustly good policies/actions exist. So I’d bet that shifting discussion from “what is your P(doom)?” to “what are your preferred policies for x-risk?” would make for much more productive conversations.
(First, my background assumptions for this discussion: I fear AGI is reachable, the leap from AGI to ASI is short, and sufficiently robust ASI alignment is impossible in principle.)
Given the assumptions above, and assuming AGI becomes imminent, then:
If AGI would require scaling multiple orders of magnitude above current frontier models, then I would say the minimum sufficient policy is a global, permanent halt, enforced by a joint China-US military treaty and tacit European cooperation. Imagine nuclear non-proliferation, but with less tolerance for rogue states.
If AGI is easy (say, if it can be adapted to existing models with a <$50 million training run, and the key insights are simple enough to fit in a few papers), then no policy may be sufficient, and humans may be doomed to an eventual loss of control.
Since my upstream model is “If we succeed in building AGI, then the road to ASI is short, and ASI very robustly causes loss of human control,” the core of my policy proposals is a permanent halt. How we would enforce a halt might be complicated or impossible. But at the end of the day, either someone builds it or nobody builds it. So the core policy is essentially binary.
The big challenges I see are:
There are a bunch of smart people who accurately understand that current LLMs fail in embarassing ways, and who believe that AGI is too far away to be a serious issue. These people mostly see warnings of AGI as supporting (in their view) corrupt Silicon Valley hucksters. To make these people care about safety, they would need to be convinced that AGI might arrive in the next 10-20 years, and to feel it.
The people who do believe that AGI is possible in the near future are frequently seduced by various imagined benfits. These benefits may be either short-sighted visions of economic empire, or near-messianic visions of utopia. To make these people care about safety, they would need to be convinced that humans risk losing control, and that SkyNet will not increase their quarterly revenue.
Out of the people who believe that AGI is both possible in the near future, and who think that it might be very dangerous, many of them hope that there is some robust way to control multiple ASIs indefinitely. Convincing these people to (for example) support a halt would require convincing them that (a) alignment is extremely difficult or impossible, and (b) that there is actually some real-world way to halt. Otherwise, they may very reasonably default to plans like, “Try for some rough-near term ‘alignment’, and hope for some very lucky rolls of the dice.”
The central challenge is that nothing like AGI or ASI has ever existed. And building consensus around even concrete things with clear scientific answers (e.g., cigarettes causing lung cancer) can be very difficult once incentives are involved. And we currently have low agreement on how AGI might turn out, for both good and bad reasons. Humans (very reasonably) fail to follow long chains of hypotheticals. It’s almost always a good heuristic.
So trying to optimize rhetorical strategies for multiple groups with very different basic opinions is difficult.
Seems reasonable but this feels like an object level answer to what I assumed was a more meta question. (Like, this answers what you would want in a policy, and I read 1a3orn’s question as why this question isn’t Typed in a clear and flexible enough way)
Yeah, that’s absolutely fair. I mostly gave my personal answers on the object level, and then I tried to generalize to the larger issue of why there’s no simple communication strategy here.