Progress and Prizes in AI Alignment

Edit: In case it’s not ob­vi­ous, I have done limited re­search on AI al­ign­ment or­ga­ni­za­tions and the goal of my post is to ask ques­tions from the point of view of some­one who wants to con­tribute and is un­sure how. Read down to the com­ments for some great info on the topic.

I was in­tro­duced to the topic of AI al­ign­ment when I joined this very fo­rum in 2014. Two years and one “Su­per­in­tel­li­gence” later, I de­cided that I should donate some money to the effort. I knew about MIRI, and I looked for­ward to read­ing some re­search com­par­ing their work to the other or­ga­ni­za­tions work­ing in this space. The only prob­lem is… there re­ally aren’t any.

MIRI re­cently an­nounced a new re­search agenda fo­cused on “agent foun­da­tions”. Yet even the Open Philan­thropy Pro­ject, made up of peo­ple who at least share MIRI’s broad wor­ld­view, can’t de­cide whether that re­search di­rec­tion is promis­ing or use­less. The Berkeley Cen­ter for Hu­man-Com­pat­i­ble AI doesn’t seem to have a spe­cific re­search agenda be­yond Stu­art Rus­sell. The AI100 Cen­ter at Stan­ford is just kick­ing off. That’s it.

I think that there are two prob­lems here:

  1. There’s no way to tell which cur­rent or­ga­ni­za­tion is go­ing to make the most progress to­wards solv­ing AI al­ign­ment.

  2. Th­ese or­ga­ni­za­tions are likely to be very similar to each other, not least be­cause they prac­ti­cally share a zip­code. I don’t think that MIRI and the aca­demic cen­ters will do the ex­act same re­search, but in the huge space of po­ten­tial ap­proaches to AI al­ign­ment they will likely end up pretty close to­gether. Where’s the group of evo-psych savvy philoso­phers who don’t know any­thing about com­puter sci­ence but are work­ing to spell out an ap­prox­i­ma­tion of uni­ver­sal hu­man moral in­tu­itions?

It seems like there’s a meta-ques­tion that needs to be ad­dressed, even be­fore any work is ac­tu­ally done on AI al­ign­ment it­self:

How to eval­u­ate progress in AI al­ign­ment?

Any an­swer to that ques­tion, even if not perfectly com­pre­hen­sive or ob­jec­tive, will en­able two things. First of all, it will al­low us to di­rect money (and the best peo­ple) to the ex­ist­ing or­ga­ni­za­tions where they’ll make the most progress.

More im­por­tantly, it will en­able us to open up the prob­lem of AI al­ign­ment to the world and crowd­source it.

For ex­am­ple, the XPrize Foun­da­tion is a re­mark­able or­ga­ni­za­tion that cre­ates com­pe­ti­tions around achiev­ing goals benefi­cial to hu­man­ity, from lu­nar rovers to ecolog­i­cal mon­i­tor­ing. The prizes have two huge benefits over di­rect in­vest­ment in solv­ing an is­sue:

  1. They usu­ally at­tract a lot more effort than what the prize money it­self would pay for. Com­peti­tors of­ten spend in ag­gre­gate 2-10 times the prize amount in their efforts to win the com­pe­ti­tion.

  2. The XPrizes at­tract a wide va­ri­ety of cre­ative en­trants from around the world, be­cause they only de­scribe what needs to be done, not how.

So, why isn’t there an XPrize for AI safety? You need very clear guidelines to cre­ate an hon­est com­pe­ti­tion, like build the cheap­est space­ship that can take 3 peo­ple to 100km and be reused within 2 weeks”. It doesn’t seem like we’re close to be­ing able to for­mu­late any­thing similar for AI al­ign­ment. It also seems that if any­one will have good ideas on the sub­ject, it will be the peo­ple on this fo­rum. So, what do y’all think?
Can we come up with cre­ative ways to ob­jec­tively mea­sure some as­pect of progress on AI safety, enough to set up a com­pe­ti­tion around it?