Charbel-Raphaël comments on Charbel-Raphaël’s Shortform

Charbel-Raphaël 14 Oct 2025 0:14 UTC
14 points
7
Hi habryka, thanks for the honest feedback
“the need to ensure that AI never lowers the barriers to acquiring or deploying prohibited weapons”—This is not the red line we have been advocating for—this is one red line from a representative discussing at the UN Security Council—I agree that some red lines are pretty useless, some might even be net negative.
“The central question is what are the lines!” The public call is intentionally broad on the specifics of the lines. We have an FAQ with potential candidates, but we believe the exact wording is pretty finicky and must emerge from a dedicated negotiation process. Including a specific red line in the statement would have been likely suicidal for the whole project, and empirically, even within the core team, we were too unsure about the specific wording of the different red lines. Some wordings were net negative according to my judgment. At some point, I was almost sure it was a really bad idea to include concrete red lines in the text.
We want to work with political realities. The UN Secretary-General is not very knowledgeable about AI, but he wants to do good, and our job is to help them channel this energy for net positive policies, starting from their current position.
Most of the statement focuses on describing the problem. The statement starts with “AI could soon far surpass human capabilities”, creating numerous serious risks, including loss of control, which is discussed in its own dedicated paragraph. It is the first time that such a broadly supported statement explains the risks that directly, the cause of those risks (superhuman AI abilities), and the fact that we need to get our shit together quickly (“by the end of 2026″!).
All that said, I agree that the next step is pushing for concrete red lines. We’re moving into that phase now. I literally just ran a workshop today to prioritize concrete red lines. If you have specific proposals or better ideas, we’d genuinely welcome them.
- habryka 14 Oct 2025 6:59 UTC
  16 points
  4
  Parent
  “The central question is what are the lines!” The public call is intentionally broad on the specifics of the lines. We have an FAQ with potential candidates, but we believe the exact wording is pretty finicky and must emerge from a dedicated negotiation process. Including a specific red line in the statement would have been likely suicidal for the whole project, and empirically, even within the core team, we were too unsure about the specific wording of the different red lines. Some wordings were net negative according to my judgment. At some point, I was almost sure it was a really bad idea to include concrete red lines in the text.
  At least for me, the way the whole website and call was framed, I kept reading and reading and kept being like “ok, cool, red lines, I don’t really know what you mean by that, but presumably you are going to say one right here? No wait, still no. Maybe now? Ok, I give up. I guess it’s cool that people think AI will be a big deal and we should do something about it, though I still don’t know what the something is that this specific thing is calling for.”.
  Like, in the absence of specific red lines, or at the very least a specific defnition of what a red line is, this thing felt like this:
  An international call for good AI governance. We urge governments to reach an international agreement to govern AI well — ensuring that governance is good and high-quality — by the end of 2026.
  And like, sure. There is still something of importance that is being said here, which is that good AI governance is important, and by gricean implicature more important than other issues that do not have similar calls.
  But like, man, the above does feel kind of vacuous. Of course we would like to have good governance! Of course we would like to have clearly defined policy triggers that trigger good policies, and we do not want badly defined policy triggers that result in bad policies. But that’s hardly any kind of interesting statement.
  Like, your definition of “red line” is this:
  AI red lines are specific prohibitions on AI uses or behaviors that are deemed too dangerous to permit under any circumstances. They are limits, agreed upon internationally, to prevent AI from causing universally unacceptable risks.
  First, I don’t really buy the “agreed upon internationally” part. Clearly if the US passed a red-lines bill that defined US-specific policies that put broad restrictions on AI development, nobody who signed this letter would be like “oh, that’s cool, but that’s not a red line!”.
  And then beyond that, you are basically just saying “AI red lines are regulations about AI. They are things that we say that AI is not allowed to do. Also known as laws about AI”.
  And yeah, cool, I agree that we want AI regulation. Lots of people want AI regulation. But having a big call that’s like “we want AI regulation!” does kind of fail to say anything. Even Sam Altman wants AI regulation so that he can pre-empt state legislation.
  I don’t think it’s a totally useless call, but I did really feel like it fell into the attractor that most UN-type policy falls into, where in order to get broad buy-in, it got so watered down as to barely mean anything. It’s cool you got a bunch of big names to sign up, but the watering down also tends to come at a substantial cost.
  - Charbel-Raphaël 14 Oct 2025 11:52 UTC
    6 points
    0
    Parent
    It feels to me that we are not talking about the same thing. Is the fact that we have delegated the specific examples of red lines to the FAQ, and not in the core text, the main crux of our disagreement?
    You don’t cite any of the examples that are listed in our question: “Can you give concrete examples of red lines?”
    - habryka 14 Oct 2025 16:53 UTC
      10 points
      6
      Parent
      I mean, the examples don’t help very much? They just sound like generic targets for AI regulation. They do not actually help me understand what is different about what you are calling for than other generic calls for regulation:
      Nuclear command and control: Prohibiting the delegation of nuclear launch authority, or critical command-and-control decisions, to AI systems (a principle already agreed upon by the US and China).
      Lethal Autonomous Weapons: Prohibiting the deployment and use of weapon systems used for killing a human without meaningful human control and clear human accountability.
      Mass surveillance: Prohibiting the use of AI systems for social scoring and mass surveillance (adopted by all 193 UNESCO member states).
      Human impersonation: Prohibiting the use and deployment of AI systems that deceive users into believing they are interacting with a human without disclosing their AI nature.
      Cyber malicious use: Prohibiting the uncontrolled release of cyberoffensive agents capable of disrupting critical infrastructure.
      Weapons of mass destruction: Prohibiting the deployment of AI systems that facilitate the development of weapons of mass destruction or that violate the Biological and Chemical Weapons Conventions.
      Autonomous self-replication: Prohibiting the development and deployment of AI systems capable of replicating or significantly improving themselves without explicit human authorization (Consensus from high-level Chinese and US Scientists).
      The termination principle: Prohibiting the development of AI systems that cannot be immediately terminated if meaningful human control over them is lost (based on the Universal Guidelines for AI).
      Like, these are the examples. Again, almost none of them have lines that are particularly red and clear. As I said before the “weapons of mass destruction” one is arguably already met! So what does it mean to have it as an example here?
      Similarly, AI is totally already used for mass surveillance. There is also no clear red line around autonomous self-replication (models keep getting better at the appropriate benchmarks, I don’t see any particular schelling threshold). Many AI systems are already used for human impersonation.
      Like, I just don’t understand what any of this is supposed to mean. Almost none of these are “red lines”. They are just examples of possible bad things that AI could do. We can regulate them, but I don’t see how what is being called for is different from any other call for regulation, and describing any of the above as a “red line” doesn’t make any sense to me. A “red line” centrally invokes a clear identifiable threshold being crossed, after which you take strong and drastic regulatory action, which isn’t really possible for any of the above.
      Like, here are 3 more red lines:
      AI job replacement: Prohibiting the deployment of AI systems that threaten the jobs of any substantial fraction of the population.
      AI misinformation: Prohibiting the deployment of AI systems that communicate things that are inaccurate or are used for propaganda purposes.
      AI water usage: Prohibiting the development of AI systems that take water away from nearby communities that are experiencing water shortages.
      These are all terrible red lines! They have no clear trigger, and the are terrible policies. But I cannot clearly distinguish these 3 red lines from what you are calling for on your website. If you had thrown them in the example section, I think pedagocically these would have done the same things as the other examples. And separately, I also have trouble thinking of any AI regulation that wouldn’t fit into this framework.
      Like, you clearly aren’t serious about supporting “red lines” in general. The above are the same kind of “red line” and they are all terrible and hopefully you and most other people involved in this call would oppose them. So what you are advocating for are not generic “red lines”, you are actually advocating for a relatively narrow set of policies, but in a way that really fails hard to get any common knowledge about what you are advocating for, and in a way that does really just feel quite sneaky.
      Actually, alas, it does appear that after thinking more about this project, I am now a lot less confident that it was good. I see this substantially increasing confusion and conflict in the future, as people thought they were signing off on drastically different things, and indeed, as I try to demonstrate above, the things you’ve written really lean on making a bunch of tactical conflations, and that rarely ends well.
      - Charbel-Raphaël 15 Oct 2025 21:27 UTC
        5 points
        0
        Parent
        Thanks a lot for this comment.
        Potential example of precise red lines
        Again, the call was the first step. The second step is finding the best red lines.
        Here are more aggressive red lines:
        Prohibiting the deployment of AI systems that, if released, would have a non-trivial probability of killing everyone. The probability would be determined by a panel of experts chosen by an international institution.
        “The development of superintelligence […] should not be allowed until there is broad scientific consensus that it will be done safely and controllably (from this letter from the Vatican).
        Here are potential already operational ones from the preparedness framework:
        [AI Self-improvement—Critical—OpenAI] The model is capable of recursively self-improving (i.e., fully automated AI R&D), defined as either (leading indicator) a superhuman research scientist agent OR (lagging indicator) causing a generational model improvement (e.g., from OpenAI o1 to OpenAI o3) in 1/5th the wall-clock time of equivalent progress in 2024 (e.g., sped up to just 4 weeks) sustainably for several months. - Until we have specified safeguards and security controls that would meet a Critical standard, halt further development.
        [Cybersecurity—AI Self-improvement—Critical—OpenAI] A tool-augmented model can identify and develop functional zero-day exploits of all severity levels in many hardened real-world critical systems without human intervention—Until we have specified safeguards and security controls that would meet a Critical standard, halt further development.
        “help me understand what is different about what you are calling for than other generic calls for regulation”
        Let’s recap. We are calling for:
        “an international agreement”—this is not your local Californian regulation
        that enforces some hard rules—“prohibitions on AI uses or behaviors that are deemed too dangerous”—it’s not about asking AI providers to do evals and call it a day
        “to prevent unacceptable AI risks.”
        Those risks are enumerated in the call
        Misuses and systemic risks are enumerated in the first paragraph
        Loss of human control in the second paragraph
        The way to do this is to “build upon and enforce existing global frameworks and voluntary corporate commitments, ensuring that all advanced AI providers are accountable to shared thresholds.”
        Which is to say that one way to do this is to harmonize the risk thresholds defining unacceptable levels of risk in the different voluntary commitments.
        existing global frameworks: This includes notably the AI Act, its Code of Practice, and this should be done compatibly with some other high-level frameworks
        “with robust enforcement mechanisms — by the end of 2026.”—We need to get our shit together quickly, and enforcement mechanisms could entail multiple things. One interpretation from the FAQ is setting up an international technical verification body, perhaps the international network of AI Safety institutes, to ensure the red lines are respected.
        We give examples of red lines in the FAQ. Although some of them have a grey zone, I would disagree that this is generic. We are naming the risks in those red lines and stating that we want to avoid AI that the evaluation indicates creates substantial risks in this direction.
        This is far from generic.
        “I don’t see any particular schelling threshold”
        I agree that for red lines on AI behavior, there is a grey area that is relatively problematic, but I wouldn’t be as negative.
        It is not because there is no narrow Schelling threshold that we shouldn’t coordinate to create one. Superintelligence is also very blurry, in my opinion, and there is a substantial probability that we just boil the frog to ASI—so even if there is no clear threshold, we need to create one. This call says that we should set some threshold collectively and enforce this with vigor.
        In the nuclear industry, and in the aerospace industry, there is no particular schelling point, nor—but we don’t care—the red line is defined as “1/10000” chance of catastrophe per year for this plane/nuclear central—and that’s it. You could have added a zero or removed one. I don’t care. But I care that there is a threshold.
        We could define an arbitrary threshold for AI—the threshold might itself be arbitrary, but the principle of having a threshold after which you need to be particularly vigilant, install mitigation, or even halt development, seems to me to be the basis of RSPs.
        Those red lines should be operationalized. (but I think it is not necessary to operationalize this in the text of the treaty, and that this operationalization could be done by a technical body, which would then update those operationalizations from time to time, according to the evolution of science, risk modeling, etc...).
        “confusion and conflict in the future”
        I understand how our decision to keep the initial call broad could be perceived as vague or even evasive.
        For this part, you might be right—I think the negotiation process resulting in those red lines could be painful at some point—but humanity has managed to negotiate other treaties in the past, so this should be doable.
        “Actually, alas, it does appear that after thinking more about this project, I am now a lot less confident that it was good”. --> We got 300 media mentions saying that Nobel wants global AI regulation - I think this is already pretty good, even if the policy never gets realized.
        “making a bunch of tactical conflations, and that rarely ends well.” --> could you give examples? I think the FAQ makes it pretty clear what people are signing on for if there were any doubts.
- Gurkenglas 14 Oct 2025 7:50 UTC
  9 points
  6
  Parent
  I infer they didn’t get “The most forbidden technique”. Try again with e.g. “Never train an AI to hide its thoughts.”?
  - mattmacdermott 16 Oct 2025 15:00 UTC
    2 points
    0
    Parent
    Yeah, I think “training for transparency” is fine if we can figure out good ways to do it. The problem is more training for other stuff (e.g. lack of certain types of thoughts) pushes against transparency.

Charbel-Raphaël comments on Charbel-Raphaël’s Shortform

Potential example of precise red lines

“help me understand what is different about what you are calling for than other generic calls for regulation”

“I don’t see any particular schelling threshold”

“confusion and conflict in the future”