I see that the “International Treaties on AI” idea takes heavy inspiration from nuclear arms control agreements. However, in these discussions, nuclear arms control is usually pictured as a kind of solved problem, a thing of the past.
I think the validity of this heroic narrative arc that human civilization, faced with the existential threat of nuclear annihilation, came together and neatly contained the problem is dubious.
In the grand scheme of things, nuclear weapons are still young. They’re still here and still very much threatening; just because we stopped focusing on them as much as we did during the cold war has no bearing on the possibility of nuclear weapons being used in a future conflict.
In the same vein, an international AI capabilities limit regime isn’t the happy ending that the AI safety community perhaps thinks it is.
One key difference with nuclear weapons is that algorithmic improvements and hardware evolution will continue to lower the threshold for training dangerous AI systems in secret, both by rogue states and individuals. What then? Mass surveillance?
Note also: the last US-Russia nuclear arms-control treaty expires next week; far from neatly containing the problem we’re watching an ongoing breakdown of decades-old norms. I’m worried.
In the same vein, an international AI capabilities limit regime isn’t the happy ending that the AI safety community perhaps thinks it is.
I appreciate the “perhaps” here.
I think it’s well understood by the people around who want an international treaty that it isn’t a stable end state, because of algorithmic progress and the increasing risk of a catastrophic illegal training run over time, and that we need some off ramp from that regime into some stable solution (presumably involving trustworthy AGIs).
Different people have different guesses about what that off ramp can or should be.
I think it’s well understood by the people around who want an international treaty that it isn’t a stable end state
My impression of the common narrative is that nation states agreeing to limit training run sizes is presented as a kind of holy grail achieved through the very arduous journey of trying to solve a difficult global coordination problem. It’s where the answer to “well, what should be done?” terminates.
I heard “stop the training runs”, but not “stop new algorithms”, or “collective roll back to 22nm lithography”.
This is why they advocate for a crash program in adult human intelligence enhancement—to very rapidly make people are smart enough to get alignment right on the first try, before the international regime breaks down.
Further, only other detailed, written, plan that I’m aware of, explicitly expects to be able to maintain the international capability limiting regime for only about one decade, after which the plan is to handoff to trusted AIs. (I’m not citing that one since it’s not published yet.)
I’m not personally aware of anyone that thinks that an international ban or slowdown is a permanent equilibrium.
The AI alignment problem does not look to us like it is fundamentally unsolvable.
I wonder what the basis for this belief is? Rice’ theorem suggests that there is no general algorithm for predicting semantic properties in programs, and that the only way to know what it does is to actually run it.
I doubt that the situation with AI systems is worse than that of nukes. While there are nukes in states as rogue as North Korea, I hope that AI development is bottlenecked on compute (which in turn is bottlenecked on TSMC and other compute factories): there is no known model which is better than Grok 3 and was trained by less than 1⁄300 compute of Grok 3. See also Aaron Scher’s analysis and comments to it.
I see that the “International Treaties on AI” idea takes heavy inspiration from nuclear arms control agreements. However, in these discussions, nuclear arms control is usually pictured as a kind of solved problem, a thing of the past.
I think the validity of this heroic narrative arc that human civilization, faced with the existential threat of nuclear annihilation, came together and neatly contained the problem is dubious.
In the grand scheme of things, nuclear weapons are still young. They’re still here and still very much threatening; just because we stopped focusing on them as much as we did during the cold war has no bearing on the possibility of nuclear weapons being used in a future conflict.
In the same vein, an international AI capabilities limit regime isn’t the happy ending that the AI safety community perhaps thinks it is.
One key difference with nuclear weapons is that algorithmic improvements and hardware evolution will continue to lower the threshold for training dangerous AI systems in secret, both by rogue states and individuals. What then? Mass surveillance?
Note also: the last US-Russia nuclear arms-control treaty expires next week; far from neatly containing the problem we’re watching an ongoing breakdown of decades-old norms. I’m worried.
I appreciate the “perhaps” here.
I think it’s well understood by the people around who want an international treaty that it isn’t a stable end state, because of algorithmic progress and the increasing risk of a catastrophic illegal training run over time, and that we need some off ramp from that regime into some stable solution (presumably involving trustworthy AGIs).
Different people have different guesses about what that off ramp can or should be.
My impression of the common narrative is that nation states agreeing to limit training run sizes is presented as a kind of holy grail achieved through the very arduous journey of trying to solve a difficult global coordination problem. It’s where the answer to “well, what should be done?” terminates.
I heard “stop the training runs”, but not “stop new algorithms”, or “collective roll back to 22nm lithography”.
From the online resources of IABIED:
This is why they advocate for a crash program in adult human intelligence enhancement—to very rapidly make people are smart enough to get alignment right on the first try, before the international regime breaks down.
Further, only other detailed, written, plan that I’m aware of, explicitly expects to be able to maintain the international capability limiting regime for only about one decade, after which the plan is to handoff to trusted AIs. (I’m not citing that one since it’s not published yet.)
I’m not personally aware of anyone that thinks that an international ban or slowdown is a permanent equilibrium.
In the link,
I wonder what the basis for this belief is? Rice’ theorem suggests that there is no general algorithm for predicting semantic properties in programs, and that the only way to know what it does is to actually run it.
I doubt that the situation with AI systems is worse than that of nukes. While there are nukes in states as rogue as North Korea, I hope that AI development is bottlenecked on compute (which in turn is bottlenecked on TSMC and other compute factories): there is no known model which is better than Grok 3 and was trained by less than 1⁄300 compute of Grok 3. See also Aaron Scher’s analysis and comments to it.