(First, my background assumptions for this discussion: I fear AGI is reachable, the leap from AGI to ASI is short, and sufficiently robust ASI alignment is impossible in principle.)
Whose policy? A policy enforced by treaty at the UN? The policy of regulators in the US? An international treaty policy—enforced by which nations?
Given the assumptions above, and assuming AGI becomes imminent, then:
If AGI would require scaling multiple orders of magnitude above current frontier models, then I would say the minimum sufficient policy is a global, permanent halt, enforced by a joint China-US military treaty and tacit European cooperation. Imagine nuclear non-proliferation, but with less tolerance for rogue states.
If AGI is easy (say, if it can be adapted to existing models with a <$50 million training run, and the key insights are simple enough to fit in a few papers), then no policy may be sufficient, and humans may be doomed to an eventual loss of control.
Why a single necessary and sufficient policy? What if the most realistic way of helping everyone is several policies that are by themselves insufficient, but together sufficient?
Since my upstream model is “If we succeed in building AGI, then the road to ASI is short, and ASI very robustly causes loss of human control,” the core of my policy proposals is a permanent halt. How we would enforce a halt might be complicated or impossible. But at the end of the day, either someone builds it or nobody builds it. So the core policy is essentially binary.
The big challenges I see are:
There are a bunch of smart people who accurately understand that current LLMs fail in embarassing ways, and who believe that AGI is too far away to be a serious issue. These people mostly see warnings of AGI as supporting (in their view) corrupt Silicon Valley hucksters. To make these people care about safety, they would need to be convinced that AGI might arrive in the next 10-20 years, and to feel it.
The people who do believe that AGI is possible in the near future are frequently seduced by various imagined benfits. These benefits may be either short-sighted visions of economic empire, or near-messianic visions of utopia. To make these people care about safety, they would need to be convinced that humans risk losing control, and that SkyNet will not increase their quarterly revenue.
Out of the people who believe that AGI is both possible in the near future, and who think that it might be very dangerous, many of them hope that there is some robust way to control multiple ASIs indefinitely. Convincing these people to (for example) support a halt would require convincing them that (a) alignment is extremely difficult or impossible, and (b) that there is actually some real-world way to halt. Otherwise, they may very reasonably default to plans like, “Try for some rough-near term ‘alignment’, and hope for some very lucky rolls of the dice.”
The central challenge is that nothing like AGI or ASI has ever existed. And building consensus around even concrete things with clear scientific answers (e.g., cigarettes causing lung cancer) can be very difficult once incentives are involved. And we currently have low agreement on how AGI might turn out, for both good and bad reasons. Humans (very reasonably) fail to follow long chains of hypotheticals. It’s almost always a good heuristic.
So trying to optimize rhetorical strategies for multiple groups with very different basic opinions is difficult.
Seems reasonable but this feels like an object level answer to what I assumed was a more meta question. (Like, this answers what you would want in a policy, and I read 1a3orn’s question as why this question isn’t Typed in a clear and flexible enough way)
Yeah, that’s absolutely fair. I mostly gave my personal answers on the object level, and then I tried to generalize to the larger issue of why there’s no simple communication strategy here.
(First, my background assumptions for this discussion: I fear AGI is reachable, the leap from AGI to ASI is short, and sufficiently robust ASI alignment is impossible in principle.)
Given the assumptions above, and assuming AGI becomes imminent, then:
If AGI would require scaling multiple orders of magnitude above current frontier models, then I would say the minimum sufficient policy is a global, permanent halt, enforced by a joint China-US military treaty and tacit European cooperation. Imagine nuclear non-proliferation, but with less tolerance for rogue states.
If AGI is easy (say, if it can be adapted to existing models with a <$50 million training run, and the key insights are simple enough to fit in a few papers), then no policy may be sufficient, and humans may be doomed to an eventual loss of control.
Since my upstream model is “If we succeed in building AGI, then the road to ASI is short, and ASI very robustly causes loss of human control,” the core of my policy proposals is a permanent halt. How we would enforce a halt might be complicated or impossible. But at the end of the day, either someone builds it or nobody builds it. So the core policy is essentially binary.
The big challenges I see are:
There are a bunch of smart people who accurately understand that current LLMs fail in embarassing ways, and who believe that AGI is too far away to be a serious issue. These people mostly see warnings of AGI as supporting (in their view) corrupt Silicon Valley hucksters. To make these people care about safety, they would need to be convinced that AGI might arrive in the next 10-20 years, and to feel it.
The people who do believe that AGI is possible in the near future are frequently seduced by various imagined benfits. These benefits may be either short-sighted visions of economic empire, or near-messianic visions of utopia. To make these people care about safety, they would need to be convinced that humans risk losing control, and that SkyNet will not increase their quarterly revenue.
Out of the people who believe that AGI is both possible in the near future, and who think that it might be very dangerous, many of them hope that there is some robust way to control multiple ASIs indefinitely. Convincing these people to (for example) support a halt would require convincing them that (a) alignment is extremely difficult or impossible, and (b) that there is actually some real-world way to halt. Otherwise, they may very reasonably default to plans like, “Try for some rough-near term ‘alignment’, and hope for some very lucky rolls of the dice.”
The central challenge is that nothing like AGI or ASI has ever existed. And building consensus around even concrete things with clear scientific answers (e.g., cigarettes causing lung cancer) can be very difficult once incentives are involved. And we currently have low agreement on how AGI might turn out, for both good and bad reasons. Humans (very reasonably) fail to follow long chains of hypotheticals. It’s almost always a good heuristic.
So trying to optimize rhetorical strategies for multiple groups with very different basic opinions is difficult.
Seems reasonable but this feels like an object level answer to what I assumed was a more meta question. (Like, this answers what you would want in a policy, and I read 1a3orn’s question as why this question isn’t Typed in a clear and flexible enough way)
Yeah, that’s absolutely fair. I mostly gave my personal answers on the object level, and then I tried to generalize to the larger issue of why there’s no simple communication strategy here.