Reviews of “Is power-seeking AI an existential risk?”

(Edited 10/​14/​22 to add Lifland review. Edited 7/​10/​23 to add Levinstein review. Edited 10/​18/​23 to add superforecaster reviews.)

Open Philanthropy solicited reviews of my draft report “Is power-seeking AI an existential risk?” (Edit: arXiv version here) from various sources. Where the reviewers allowed us to make their comments public in this format, links to these comments are below, along with some responses from me in blue.

  1. Leopold Aschenbrenner

  2. Ben Garfinkel

  3. Daniel Kokotajlo

  4. Ben Levinstein

  5. Eli Lifland

  6. Neel Nanda

  7. Nate Soares

  8. Christian Tarsney

  9. David Thorstad

  10. David Wallace

  11. Anonymous 1 (software engineer at AI research team)

  12. Anonymous 2 (academic computer scientist)

The table below (spreadsheet link here) summarizes each reviewer’s probabilities and key objections.

Screenshot of linked summary spreadsheet

An academic economist focused on AI also provided a review, but they declined to make it public in this format.

Added 10/​18/​23: With funding from Open Philanthropy, Good Judgment also solicited reviews and forecasts from 21 superforecasters regarding the report—see here for a summary of the results. These superforecasters completed a survey very similar to the one completed by the other reviewers, except with an additional question (see footnote) about the “multiple stage fallacy.”[1] Their aggregated medians were:

Good Judgment has also prepared more detailed summaries of superforecaster comments and forecasts here (re: my report) and here (re: the other timelines and X-risk questions). See here for some brief reflections on these results, and here for a public spreadsheet with the individual superforecasters numbers and reviews (also screenshot-ed below).

  1. ^

    The new question (included in the final section of the survey) was:

    “One concern about the estimation method in the report is that the multi-premise structure biases towards lower numbers (this is sometimes called the “multi-stage fallacy”; see also Soares here). For example, forecasters might fail to adequately condition on all of the previous premises being true and to account for their correlations, or they might be biased away from assigning suitably extreme probabilities to individual premises.

    When you multiply through your probabilities on the individual premises, does your estimate differ significantly from the probability you would’ve given to “existential catastrophe by 2070 from worlds where all of 1-6 are true” when estimating it directly? If so, in what direction?