Reflections on AI Timelines Forecasting Thread

It’s been ex­cit­ing to see peo­ple en­gage with the AI fore­cast­ing thread that Ben, Daniel, and I set up! The thread was in­spired by Alex Ir­pan’s AGI timeline up­date, and our hy­poth­e­sis that vi­su­al­iz­ing and com­par­ing AGI timelines could gen­er­ate bet­ter pre­dic­tions. Ought has been work­ing on the prob­a­bil­ity dis­tri­bu­tion tool, Elicit, and it was awe­some to see it in ac­tion.

14 users shared their AGI timelines. Below are a num­ber of their fore­casts over­laid, and an ag­gre­ga­tion of their fore­casts.

Com­par­i­son of 6 top-voted forecasts

Ag­gre­ga­tion, weighted by votes

The thread gen­er­ated some in­ter­est­ing learn­ings about AGI timelines and fore­cast­ing. Here I’ll dis­cuss my thoughts on the fol­low­ing:

  • The ob­ject level dis­cus­sion of AGI timelines

  • How much peo­ple changed their minds and why

  • Learn­ings about forecasting

  • Open ques­tions and next steps

AGI timelines

Sum­mary of beliefs

We calcu­lated an ag­gre­ga­tion of the 14 fore­casts weighted by the num­ber of votes each com­ment with a fore­cast re­ceived. The ques­tion wasn’t pre­cisely speci­fied (peo­ple fore­casted based on slightly differ­ent in­ter­pre­ta­tions) so I’m shar­ing these num­bers mostly for cu­ri­os­ity’s sake, rather than to make a spe­cific claim about AGI timelines.

  • Ag­gre­gated me­dian date: June 20, 2047

  • Ag­gre­gated most likely date: Novem­ber 2, 2033

  • Ear­liest me­dian date of any fore­cast: June 25, 2030

  • Lat­est me­dian date of any fore­cast: After 2100

Emer­gence of categories

I was pleas­antly sur­prised by the emer­gence of cat­e­go­riza­tions of as­sump­tions. Here are some themes in the way peo­ple struc­tured their rea­son­ing:

  • AGI from cur­rent paradigm (2023 – 2033)

    • GPT-N gets us to AGI

    • GPT-N + im­prove­ments within ex­ist­ing paradigm gets us to AGI

  • AGI from paradigm shift (2035 – 2060)

    • We need fun­da­men­tal tech­ni­cal breakthroughs

      • Quan­tum computing

      • Other new paradigms

  • AGI af­ter 2100, or never (2100 +)

    • We de­cide not to build AGI

      • We de­cide to build tool AI /​ CAIS instead

      • We move into a sta­ble state

    • It’s harder than we expect

      • It’s hard to get the right insights

      • We won’t have enough com­pute by 2100

    • We can’t built AGI

      • There’s a catas­tro­phe that stops us from be­ing able to build AGI

  • Out­side view reasoning

    • With 50% prob­a­bil­ity, things will last twice as long as they already have

    • We can ex­trap­o­late from rate of reach­ing past AI milestones

When shar­ing their fore­casts, peo­ple as­so­ci­ated these as­sump­tions with a cor­re­spond­ing date in­ter­val for when we would see AGI. I took the me­dian lower bound and me­dian up­per bound for each as­sump­tion to give a sense of what peo­ple are ex­pect­ing if each as­sump­tion is true. Here’s a spread­sheet with all of the as­sump­tions. Feel free to make a copy of the spread­sheet if you want to play around and make ed­its.

Did this thread change peo­ple’s minds?

One of the goals of mak­ing pub­lic fore­casts is to help peo­ple iden­tify dis­agree­ments and re­solve cruxes. The num­ber of peo­ple who up­dated is one mea­sure of how well this for­mat achieves this goal.

There were two up­dates in com­ments on the thread (Ben Pace and Ethan Perez), and sev­eral oth­ers not ex­plic­itly on the thread. Here are some char­ac­ter­is­tics of the thread that caused peo­ple to up­date (based on con­ver­sa­tions and in­fer­ence from com­ments):

  • It was easy to no­tice sur­pris­ing prob­a­bil­ities. In most fore­casts, Elicit’s bin in­ter­face meant prob­a­bil­ities were linked to spe­cific as­sump­tions. For ex­am­ple, it was easy to dis­agree with Ben Pace’s spe­cific be­lief that with 30% prob­a­bil­ity, we’d reach a sta­ble state and there­fore wouldn’t get AGI be­fore 2100. See­ing a vi­sual image of peo­ple’s dis­tri­bu­tions also made sur­pris­ing be­liefs (like sharp peaks) easy to spot.

  • Vi­sual com­par­i­son pro­vided a sense check. It was easy to ver­ify whether you had too lit­tle or too much un­cer­tainty com­pared to oth­ers.

  • See­ing many peo­ple’s be­liefs pro­vides new in­for­ma­tion. Separate from the in­for­ma­tion pro­vided by peo­ple’s rea­son­ing, there’s in­for­ma­tion in how many peo­ple sup­port cer­tain view­points. For ex­am­ple, mul­ti­ple peo­ple placed a non-triv­ial prob­a­bil­ity mass on the pos­si­bil­ity that we could get AGI from scal­ing GPT-3.

  • The thread cat­alyzed con­ver­sa­tions out­side of LessWrong

Learn­ings about forecasting

Vaguely defin­ing the ques­tion worked sur­pris­ingly well

The ques­tion in this thread (“Timeline un­til hu­man-level AGI”) was defined much less pre­cisely than similar Me­tac­u­lus ques­tions. This meant peo­ple were able to fore­cast us­ing their preferred in­ter­pre­ta­tion, which pro­vided more in­for­ma­tion about the range of pos­si­ble in­ter­pre­ta­tions and sources of dis­agree­ments at the in­ter­pre­ta­tion level. For ex­am­ple:

A good next step would be to cre­ate more con­sen­sus on the most pro­duc­tive in­ter­pre­ta­tion for AGI timeline pre­dic­tions.

Value of a tem­plate for predictions

When peo­ple make in­for­mal pre­dic­tions on AGI, they of­ten define their own in­ter­vals and ways of spec­i­fy­ing prob­a­bil­ities (e.g. ‘30% prob­a­bil­ity by 2035’, or ‘highly likely by 2100’). For ex­am­ple, this list of pre­dic­tions shows how vague a lot of timeline pre­dic­tions are.

Hav­ing a stan­dard tem­plate for pre­dic­tions forces peo­ple to have nu­mer­i­cal be­liefs across an en­tire range. This makes it eas­ier to com­pare pre­dic­tions and com­pute dis­agree­ments across any range (e.g. this bet sug­ges­tion based on find­ing the ear­liest range with sub­stan­tial dis­agree­ment). I’m cu­ri­ous how much more in­for­ma­tion we can cap­ture over time by en­courag­ing stan­dard­ized pre­dic­tions.

Creat­ing AGI fore­cast­ing frameworks

Ought’s mis­sion is to ap­ply ML to com­plex rea­son­ing. A key first step is mak­ing rea­son­ing about the fu­ture ex­plicit (for ex­am­ple, by de­com­pos­ing the com­po­nents of a fore­cast, iso­lat­ing as­sump­tions, and putting num­bers to be­liefs) so that we can then au­to­mate parts of the pro­cess. We’ll share more about this in a blog post that’s com­ing soon!

In this thread, it seemed like a lot of peo­ple built their own fore­cast­ing struc­ture from scratch. I’m ex­cited about lev­er­ag­ing this work to cre­ate struc­tured frame­works that peo­ple can start with when mak­ing AGI fore­casts. This has the benefits of:

  • Avoid­ing repli­ca­tion of cog­ni­tive work

  • Clearly iso­lat­ing the as­sump­tions that peo­ple dis­agree with

  • Gen­er­at­ing more rigor­ous rea­son­ing by en­courag­ing peo­ple to ex­am­ine the links be­tween differ­ent com­po­nents of a fore­cast and make them explicit

  • Pro­vid­ing data that helps us au­to­mate the rea­son­ing process

Here are some ideas for what this might look like:

  • De­com­pos­ing the ques­tion more com­pre­hen­sively based on the cat­e­gories out­lined above

    • For ex­am­ple, cre­at­ing your over­all dis­tri­bu­tion by calcu­lat­ing: P(Scal­ing hy­poth­e­sis is true) * Distri­bu­tion for when we will get AGI | Scal­ing hy­poth­e­sis is true + P(Need paradigm shift) * Distri­bu­tion for when we will get AGI | Need paradigm shift + P(Some­thing stops us) * Distri­bu­tion for when we will get AGI | Some­thing stops us

  • De­com­pos­ing AGI timelines into the fac­tors that will in­fluence it

    • For ex­am­ple, com­pute or investment

  • In­fer­ring dis­tri­bu­tions from easy ques­tions

    • For ex­am­ple, ask­ing ques­tions like: “If the scal­ing hy­poth­e­sis is true, what’s the mean year we get AGI?” and use the an­swers to in­fer peo­ple’s distributions

What’s next? Some open questions

I’d be re­ally in­ter­ested in hear­ing other peo­ple’s re­flec­tions on this thread.

Ques­tions I’m cu­ri­ous about

  • How was the ex­pe­rience for other peo­ple who par­ti­ci­pated?

  • What do peo­ple who didn’t par­ti­ci­pate but read the thread think?

  • What up­dates did peo­ple make?

  • What other ques­tions would be good to make fore­cast­ing threads on?

  • What else can we learn from in­for­ma­tion in this thread, to cap­ture the work peo­ple did?

  • How can Elicit be more helpful for these kinds of pre­dic­tions?

  • How else do you want to build on the con­ver­sa­tion started in the fore­cast­ing thread?

Ideas we have for next steps

  • Run­ning more fore­cast­ing threads on other x-risk /​ catas­trophic risks. For ex­am­ple:

    • When will hu­man­ity go ex­tinct from global catas­trophic biolog­i­cal risks?

    • How many peo­ple will die from nu­clear war be­fore 2200?

    • When will hu­man­ity go ex­tinct from as­ter­oids?

    • By 2100, how many peo­ple will die for rea­sons that would not have oc­curred if we solved cli­mate change by 2030?

  • More de­com­po­si­tion and frame­work cre­ation for AGI timeline predictions

    • We’re work­ing on mak­ing Elicit as use­ful as we can for this!