This seems like a good thing to advocate for, I’m disappointed that they don’t make any mention of extinction risk but I think establishing red lines would be a step in the right direction.
We hesitated a lot between including the term “extinction” or not in the beginning.
The final decision not to center the message on “extinction risk” was deliberate: it would have prevented most of the heads of state and organizations from signing. Our goal was to build the broadest and most influential coalition possible to advocate for international red lines, which is what’s most important to us.
By focusing on the concept of “losing meaningful human control,” we were able to achieve agreement on the precursor to most worst-case scenarios, including extinction. We were advised and received feedback from early experiments with signatories that this is a more concrete concept for policymakers and the public.
In summary, if you really want red lines to happen for real, adding the word extinction is not necessary and has more costs than benefits in this text.
This is a very valuable clarification, and I agree[1]. I really appreciate your focus on policy feasibility and concrete approaches.
From my own experience: most people outside AI Safety in the regulatory space, either: lack sufficient background knowledge about timelines and existential risk to meaningfully engage with these concerns and commit to enforceable measures[2] , or those with some familiarity become more skeptical due to the lack of consensus on probabilities, timelines, and definitions.
I will be following this initiative closely and promoting it to the best of my ability.
EDIT: I’ve signed with my institutional email and title.
For transparency: I knew about the red lines project before it was published. Furthermore, Charbel / CeSIA’s past work have shifted my own views on policy and international cooperation.
I’m disappointed that they don’t make any mention of extinction risk
Agree, but I wonder if extinction risk is just too vague, at the moment, for something like this. Absent a fast takeoff scenario, AI doom probably does look something like the gradual and unchecked increase of autonomy mentioned in the call, and I’m not sure if there’s enough evidence of a looming fast takeoff scenario for it to be taken seriously.
I think the stakes are high enough that experts should firmly state, like Eliezer, that we should back off way before fast takeoff even seems like a possibility. But I see why that may be less persuasive to outsiders.
Fast takeoff is not takeover is not extinction. For example gradual disempowerment without a fast takeoff leads to takeover, which may lead to either extinction or permanent disempowerment depending on values of AIs.
I think it’s quite plausible that AGIs merely de facto take over frontier AI R&D, with enough economic prosperity and human figureheads to ensure humanity’s complacency. And when later there’s superintelligence, humanity might notice that it’s left with a tiny insignificant share in resources of the reachable universe and no prospect at all of ever changing this, even on cosmic timescales.
Agree; I’m strongly in favor of using a term like “disempowerment-risk” over “extinction-risk” in communication to laypeople – I think the latter detracts from the more important question of preventing a loss of control and emphasizes the thing that happens after, which is far more speculative (and often invites the common “sci-fi scenario” criticism).
Of course, it doesn’t sound as flashy, but I think saying “we shouldn’t build a machine that takes control of our entire future” is sufficiently attention-grabbing.
Well, an unstoppable superintelligence paperclipping the entire planet is certainly a national security concern and a systematic human rights violation, I guess.
Jokes aside, some of the proposed red lines clearly do hint at that—no self replication and immediate termination are clearly safeguards against the AIs themselves, not just human misuse.
This seems like a good thing to advocate for, I’m disappointed that they don’t make any mention of extinction risk but I think establishing red lines would be a step in the right direction.
We hesitated a lot between including the term “extinction” or not in the beginning.
The final decision not to center the message on “extinction risk” was deliberate: it would have prevented most of the heads of state and organizations from signing. Our goal was to build the broadest and most influential coalition possible to advocate for international red lines, which is what’s most important to us.
By focusing on the concept of “losing meaningful human control,” we were able to achieve agreement on the precursor to most worst-case scenarios, including extinction. We were advised and received feedback from early experiments with signatories that this is a more concrete concept for policymakers and the public.
In summary, if you really want red lines to happen for real, adding the word extinction is not necessary and has more costs than benefits in this text.
This is a very valuable clarification, and I agree[1]. I really appreciate your focus on policy feasibility and concrete approaches.
From my own experience: most people outside AI Safety in the regulatory space, either: lack sufficient background knowledge about timelines and existential risk to meaningfully engage with these concerns and commit to enforceable measures[2] , or those with some familiarity become more skeptical due to the lack of consensus on probabilities, timelines, and definitions.
I will be following this initiative closely and promoting it to the best of my ability.
EDIT: I’ve signed with my institutional email and title.
For transparency: I knew about the red lines project before it was published. Furthermore, Charbel / CeSIA’s past work have shifted my own views on policy and international cooperation.
I expect that the popularity of IABIED and more involvent from AI Safety figures in policy will shift the Overton window in this regard.
When you say “prevented” do you just mean it would have been generally off-putting, or is there something specific that you’re referring to here?
Agree, but I wonder if extinction risk is just too vague, at the moment, for something like this. Absent a fast takeoff scenario, AI doom probably does look something like the gradual and unchecked increase of autonomy mentioned in the call, and I’m not sure if there’s enough evidence of a looming fast takeoff scenario for it to be taken seriously.
I think the stakes are high enough that experts should firmly state, like Eliezer, that we should back off way before fast takeoff even seems like a possibility. But I see why that may be less persuasive to outsiders.
Fast takeoff is not takeover is not extinction. For example gradual disempowerment without a fast takeoff leads to takeover, which may lead to either extinction or permanent disempowerment depending on values of AIs.
I think it’s quite plausible that AGIs merely de facto take over frontier AI R&D, with enough economic prosperity and human figureheads to ensure humanity’s complacency. And when later there’s superintelligence, humanity might notice that it’s left with a tiny insignificant share in resources of the reachable universe and no prospect at all of ever changing this, even on cosmic timescales.
Agree; I’m strongly in favor of using a term like “disempowerment-risk” over “extinction-risk” in communication to laypeople – I think the latter detracts from the more important question of preventing a loss of control and emphasizes the thing that happens after, which is far more speculative (and often invites the common “sci-fi scenario” criticism).
Of course, it doesn’t sound as flashy, but I think saying “we shouldn’t build a machine that takes control of our entire future” is sufficiently attention-grabbing.
I suppose the problem is that in most fast takeoff scenarios there is little direct evidence before it happens and one should reason on priors.
Well, an unstoppable superintelligence paperclipping the entire planet is certainly a national security concern and a systematic human rights violation, I guess.
Jokes aside, some of the proposed red lines clearly do hint at that—no self replication and immediate termination are clearly safeguards against the AIs themselves, not just human misuse.