Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns

EDIT: The com­pe­ti­tion is now closed, thanks to ev­ery­one who par­ti­ci­pated! Ro­hin’s pos­te­rior dis­tri­bu­tion is here, and win­ners are in this com­ment.

In this com­pe­ti­tion, we (Ought) want to am­plify Ro­hin Shah’s fore­cast for the ques­tion: When will a ma­jor­ity of AGI re­searchers agree with safety con­cerns? Ro­hin has pro­vided a prior dis­tri­bu­tion based on what he cur­rently be­lieves, and we want oth­ers to:

  1. Try to up­date Ro­hin’s think­ing via com­ments (for ex­am­ple, com­ments in­clud­ing rea­son­ing, dis­tri­bu­tions, and in­for­ma­tion sources). If you don’t want your com­ment to be con­sid­ered for the com­pe­ti­tion, la­bel it ‘aside’

  2. Pre­dict what his pos­te­rior dis­tri­bu­tion for the ques­tion will be af­ter he has read all the com­ments and rea­son­ing in this thread

The com­pe­ti­tion will close on Fri­day July 31st. To par­ti­ci­pate in this com­pe­ti­tion, cre­ate your pre­dic­tion on Elicit, click ‘Save Snap­shot to URL,’ and post the snap­shot link in a com­ment on this post. You can provide your rea­son­ing in the ‘Notes’ sec­tion of Elicit or in your LessWrong com­ment. You should have a low bar for mak­ing pre­dic­tions – they don’t have to be perfect.

Here is Ro­hin’s prior dis­tri­bu­tion on the ques­tion. His rea­son­ing for the prior is in this com­ment. Ro­hin spent ~30 min­utes cre­at­ing this dis­tri­bu­tion based on the be­liefs and ev­i­dence he already has. He will spend 2-5 hours gen­er­at­ing a pos­te­rior dis­tri­bu­tion.

Click here to cre­ate your distribution

We will award two $200 prizes, in the form of Ama­zon gift cards:

  1. Most ac­cu­rate pre­dic­tion: We will award $200 to the most ac­cu­rate pre­dic­tion of Ro­hin’s pos­te­rior dis­tri­bu­tion sub­mit­ted through an Elicit snap­shot. This will be de­ter­mined by es­ti­mat­ing KL di­ver­gence be­tween Ro­hin’s fi­nal dis­tri­bu­tion and oth­ers’ dis­tri­bu­tions. If you post more than one snap­shot, ei­ther your most re­cent snap­shot or the one you iden­tify as your fi­nal sub­mis­sion will be eval­u­ated.

  2. Up­date to think­ing: Ro­hin will rank each com­ment from 0 to 5 de­pend­ing on how much the rea­son­ing up­dated his think­ing. We will ran­domly se­lect one com­ment in pro­por­tion to how many points are as­signed (so, a com­ment rated 5 would be 5 times more likely to re­ceive the prize than a com­ment rated 1), and the poster of this com­ment will re­ceive the $200 prize.

Motivation

This pro­ject is similar in spirit to am­plify­ing epistemic spot checks and other work on scal­ing up in­di­vi­d­ual judg­ment through crowd­sourc­ing. As in these pro­jects, we’re hop­ing to learn about mechanisms for del­e­gat­ing rea­son­ing, this time in the fore­cast­ing do­main.

The ob­jec­tive is to learn whether mechanisms like this could save peo­ple like Ro­hin work. Ro­hin wants to know: What would I think if I had more ev­i­dence and knew more ar­gu­ments than I cur­rently do, but still fol­lowed the sorts of rea­son­ing prin­ci­ples that I’m un­likely to re­vise in the course of a com­ment thread? In real-life ap­pli­ca­tions of am­plified fore­cast­ing, Ro­hin would only eval­u­ate the ar­gu­ments in-depth and form his own pos­te­rior dis­tri­bu­tion 1 out of 10 times. 9 out of 10 times he’d just skim the key ar­gu­ments and adopt the pre­dicted pos­te­rior as his new view.

Ques­tion specification

The ques­tion is: When will a ma­jor­ity of AGI re­searchers agree with safety con­cerns?

Sup­pose that ev­ery year I (Ro­hin) talk to ev­ery top AI re­searcher about safety (I’m not ex­plain­ing safety, I’m sim­ply get­ting their be­liefs, per­haps guid­ing the con­ver­sa­tion to the safety con­cerns in the al­ign­ment com­mu­nity). After talk­ing to X, I eval­u­ate:

  1. (Yes /​ No) Is X’s work re­lated to AGI? (AGI safety counts)

  2. (Yes /​ No) Does X broadly un­der­stand the main con­cerns of the safety com­mu­nity?

  3. (Yes /​ No) Does X agree that there is at least one con­cern such that we have not yet solved it and we should not build su­per­in­tel­li­gent AGI un­til we do solve it?

I then com­pute the frac­tion #an­swers(Yes, Yes, Yes) /​ #an­swers(Yes, *, *) (i.e. the pro­por­tion of AGI-re­lated top re­searchers who are aware of safety con­cerns and think we shouldn’t build su­per­in­tel­li­gent AGI be­fore solv­ing them). In how many years will this frac­tion be >= 0.5?

For refer­ence, if I were to run this eval­u­a­tion now, I would be look­ing for an un­der­stand­ing of re­ward gam­ing, in­stru­men­tal con­ver­gence, and the challenges of value learn­ing, but would not be look­ing for an un­der­stand­ing of wire­head­ing (be­cause I’m not con­vinced it’s a prob­lem we need to worry about) or in­ner al­ign­ment (be­cause the safety com­mu­nity hasn’t con­verged on the im­por­tance of in­ner al­ign­ment).

We’ll define the set of top AI re­searchers some­what ar­bi­trar­ily as the top 1000 AI re­searchers in in­dus­try by salary and the top 1000 AI re­searchers in academia by cita­tion count.

If the frac­tion never reaches > 0.5 (e.g. be­fore the frac­tion reaches 0.5, we build su­per­in­tel­li­gent AGI and it kills us all, or it is perfectly benev­olent and ev­ery­one re­al­izes there weren’t any safety con­cerns), the ques­tion re­solves as >2100.

In­ter­pret this rea­son­ably (e.g. a com­ment to the effect of “your sur­vey will an­noy ev­ery­one and so they’ll be against safety” will be ig­nored even if true, be­cause it’s overfit­ting to the spe­cific coun­ter­fac­tual sur­vey pro­posed here and is clearly ir­rele­vant to the spirit of the ques­tion).

Ad­di­tional information

Ro­hin Shah is an AI Safety re­searcher at the Cen­ter for Hu­man-Com­pat­i­ble AI (CHAI). He also pub­lishes the Align­ment Newslet­ter. Here is a link to his web­site where you can find more in­for­ma­tion about his re­search and views.

You are wel­come to share a snap­shot dis­tri­bu­tion of your own be­liefs, but make sure to spec­ify that the snap­shot con­tains your own be­liefs and not your pre­dic­tion of Ro­hin’s be­liefs (snap­shots of your own be­liefs will not be eval­u­ated for the com­pe­ti­tion).