Again, the call was the first step. The second step is finding the best red lines.
Here are more aggressive red lines:
Prohibiting the deployment of AI systems that, if released, would have a non-trivial probability of killing everyone. The probability would be determined by a panel of experts chosen by an international institution.
“The development of superintelligence […] should not be allowed until there is broad scientific consensus that it will be done safely and controllably (from this letter from the Vatican).
[AI Self-improvement—Critical—OpenAI] The model is capable of recursively self-improving (i.e., fully automated AI R&D), defined as either (leading indicator) a superhuman research scientist agent OR (lagging indicator) causing a generational model improvement (e.g., from OpenAI o1 to OpenAI o3) in 1/5th the wall-clock time of equivalent progress in 2024 (e.g., sped up to just 4 weeks) sustainably for several months. - Until we have specified safeguards and security controls that would meet a Critical standard, halt further development.
[Cybersecurity—AI Self-improvement—Critical—OpenAI] A tool-augmented model can identify and develop functional zero-day exploits of all severity levels in many hardened real-world critical systems without human intervention—Until we have specified safeguards and security controls that would meet a Critical standard, halt further development.
“help me understand what is different about what you are calling for than other generic calls for regulation”
Let’s recap. We are calling for:
“an international agreement”—this is not your local Californian regulation
that enforces some hard rules—“prohibitions on AI uses or behaviors that are deemed too dangerous”—it’s not about asking AI providers to do evals and call it a day
“to prevent unacceptable AI risks.”
Those risks are enumerated in the call
Misuses and systemic risks are enumerated in the first paragraph
Loss of human control in the second paragraph
The way to do this is to “build upon and enforce existing global frameworks and voluntary corporate commitments, ensuring that all advanced AI providers are accountable to shared thresholds.”
Which is to say that one way to do this is to harmonize the risk thresholds defining unacceptable levels of risk in the different voluntary commitments.
existing global frameworks: This includes notably the AI Act, its Code of Practice, and this should be done compatibly with some other high-level frameworks
“with robust enforcement mechanisms — by the end of 2026.”—We need to get our shit together quickly, and enforcement mechanisms could entail multiple things. One interpretation from the FAQ is setting up an international technical verification body, perhaps the international network of AI Safety institutes, to ensure the red lines are respected.
We give examples of red lines in the FAQ. Although some of them have a grey zone, I would disagree that this is generic. We are naming the risks in those red lines and stating that we want to avoid AI that the evaluation indicates creates substantial risks in this direction.
This is far from generic.
“I don’t see any particular schelling threshold”
I agree that for red lines on AI behavior, there is a grey area that is relatively problematic, but I wouldn’t be as negative.
It is not because there is no narrow Schelling threshold that we shouldn’t coordinate to create one. Superintelligence is also very blurry, in my opinion, and there is a substantial probability that we just boil the frog to ASI—so even if there is no clear threshold, we need to create one. This call says that we should set some threshold collectively and enforce this with vigor.
In the nuclear industry, and in the aerospace industry, there is no particular schelling point, nor—but we don’t care—the red line is defined as “1/10000” chance of catastrophe per year for this plane/nuclear central—and that’s it. You could have added a zero or removed one. I don’t care. But I care that there is a threshold.
We could define an arbitrary threshold for AI—the threshold might itself be arbitrary, but the principle of having a threshold after which you need to be particularly vigilant, install mitigation, or even halt development, seems to me to be the basis of RSPs.
Those red lines should be operationalized. (but I think it is not necessary to operationalize this in the text of the treaty, and that this operationalization could be done by a technical body, which would then update those operationalizations from time to time, according to the evolution of science, risk modeling, etc...).
“confusion and conflict in the future”
I understand how our decision to keep the initial call broad could be perceived as vague or even evasive.
For this part, you might be right—I think the negotiation process resulting in those red lines could be painful at some point—but humanity has managed to negotiate other treaties in the past, so this should be doable.
“Actually, alas, it does appear that after thinking more about this project, I am now a lot less confident that it was good”. --> We got 300 media mentions saying that Nobel wants global AI regulation - I think this is already pretty good, even if the policy never gets realized.
“making a bunch of tactical conflations, and that rarely ends well.” --> could you give examples? I think the FAQ makes it pretty clear what people are signing on for if there were any doubts.
Thanks a lot for this comment.
Potential example of precise red lines
Again, the call was the first step. The second step is finding the best red lines.
Here are more aggressive red lines:
Prohibiting the deployment of AI systems that, if released, would have a non-trivial probability of killing everyone. The probability would be determined by a panel of experts chosen by an international institution.
“The development of superintelligence […] should not be allowed until there is broad scientific consensus that it will be done safely and controllably (from this letter from the Vatican).
Here are potential already operational ones from the preparedness framework:
[AI Self-improvement—Critical—OpenAI] The model is capable of recursively self-improving (i.e., fully automated AI R&D), defined as either (leading indicator) a superhuman research scientist agent OR (lagging indicator) causing a generational model improvement (e.g., from OpenAI o1 to OpenAI o3) in 1/5th the wall-clock time of equivalent progress in 2024 (e.g., sped up to just 4 weeks) sustainably for several months. - Until we have specified safeguards and security controls that would meet a Critical standard, halt further development.
[Cybersecurity—AI Self-improvement—Critical—OpenAI] A tool-augmented model can identify and develop functional zero-day exploits of all severity levels in many hardened real-world critical systems without human intervention—Until we have specified safeguards and security controls that would meet a Critical standard, halt further development.
“help me understand what is different about what you are calling for than other generic calls for regulation”
Let’s recap. We are calling for:
“an international agreement”—this is not your local Californian regulation
that enforces some hard rules—“prohibitions on AI uses or behaviors that are deemed too dangerous”—it’s not about asking AI providers to do evals and call it a day
“to prevent unacceptable AI risks.”
Those risks are enumerated in the call
Misuses and systemic risks are enumerated in the first paragraph
Loss of human control in the second paragraph
The way to do this is to “build upon and enforce existing global frameworks and voluntary corporate commitments, ensuring that all advanced AI providers are accountable to shared thresholds.”
Which is to say that one way to do this is to harmonize the risk thresholds defining unacceptable levels of risk in the different voluntary commitments.
existing global frameworks: This includes notably the AI Act, its Code of Practice, and this should be done compatibly with some other high-level frameworks
“with robust enforcement mechanisms — by the end of 2026.”—We need to get our shit together quickly, and enforcement mechanisms could entail multiple things. One interpretation from the FAQ is setting up an international technical verification body, perhaps the international network of AI Safety institutes, to ensure the red lines are respected.
We give examples of red lines in the FAQ. Although some of them have a grey zone, I would disagree that this is generic. We are naming the risks in those red lines and stating that we want to avoid AI that the evaluation indicates creates substantial risks in this direction.
This is far from generic.
“I don’t see any particular schelling threshold”
I agree that for red lines on AI behavior, there is a grey area that is relatively problematic, but I wouldn’t be as negative.
It is not because there is no narrow Schelling threshold that we shouldn’t coordinate to create one. Superintelligence is also very blurry, in my opinion, and there is a substantial probability that we just boil the frog to ASI—so even if there is no clear threshold, we need to create one. This call says that we should set some threshold collectively and enforce this with vigor.
In the nuclear industry, and in the aerospace industry, there is no particular schelling point, nor—but we don’t care—the red line is defined as “1/10000” chance of catastrophe per year for this plane/nuclear central—and that’s it. You could have added a zero or removed one. I don’t care. But I care that there is a threshold.
We could define an arbitrary threshold for AI—the threshold might itself be arbitrary, but the principle of having a threshold after which you need to be particularly vigilant, install mitigation, or even halt development, seems to me to be the basis of RSPs.
Those red lines should be operationalized. (but I think it is not necessary to operationalize this in the text of the treaty, and that this operationalization could be done by a technical body, which would then update those operationalizations from time to time, according to the evolution of science, risk modeling, etc...).
“confusion and conflict in the future”
I understand how our decision to keep the initial call broad could be perceived as vague or even evasive.
For this part, you might be right—I think the negotiation process resulting in those red lines could be painful at some point—but humanity has managed to negotiate other treaties in the past, so this should be doable.
“Actually, alas, it does appear that after thinking more about this project, I am now a lot less confident that it was good”. --> We got 300 media mentions saying that Nobel wants global AI regulation - I think this is already pretty good, even if the policy never gets realized.
“making a bunch of tactical conflations, and that rarely ends well.” --> could you give examples? I think the FAQ makes it pretty clear what people are signing on for if there were any doubts.