Announcement: AI alignment prize winners and next round

We (Zvi Mow­show­itz, Vladimir Slep­nev and Paul Chris­ti­ano) are happy to an­nounce that the AI Align­ment Prize is a suc­cess. From Novem­ber 3 to De­cem­ber 31 we re­ceived over 40 en­tries rep­re­sent­ing an in­cred­ible amount of work and in­sight. That’s much more than we dared to hope for, in both quan­tity and qual­ity.

In this post we name six win­ners who will re­ceive $15,000 in to­tal, an in­crease from the origi­nally planned $5,000.

We’re also kick­ing off the next round of the prize, which will run from to­day un­til March 31, un­der the same rules as be­fore.

The winners

First prize of $5,000 goes to Scott Garrabrant (MIRI) for his post Good­hart Tax­on­omy, an ex­cel­lent write-up de­tailing the pos­si­ble failures that can arise when op­ti­miz­ing for a proxy in­stead of the ac­tual goal. Good­hart’s Law is sim­ple to un­der­stand, im­pos­si­ble to for­get once learned, and ap­plies equally to AI al­ign­ment and ev­ery­day life. While Good­hart’s Law is widely known, break­ing it down in this new way seems very valuable.

Five more par­ti­ci­pants re­ceive $2,000 each:

  • To­bias Bau­mann (FRI) for his post Us­ing Sur­ro­gate Goals to Deflect Threats. Ad­ding failsafes to the AI’s util­ity func­tion is a promis­ing idea and we’re happy to see more de­tailed treat­ments of it.

  • Vanessa Kosoy (MIRI) for her work on Del­ega­tive Re­in­force­ment Learn­ing (1, 2, 3). Prov­ing perfor­mance bounds for agents that learn goals from each other is ob­vi­ously im­por­tant for AI al­ign­ment.

  • John Maxwell (un­af­fili­ated) for his post Friendly AI through On­tol­ogy Au­to­gen­er­a­tion. We aren’t fans of John’s over­all pro­posal, but the ac­com­pa­ny­ing philo­soph­i­cal ideas are in­trigu­ing on their own.

  • Alex Men­nen (un­af­fili­ated) for his posts on leg­i­bil­ity to other agents and learn­ing goals of sim­ple agents. The first is a neat way of think­ing about some de­ci­sion the­ory prob­lems, and the sec­ond is a po­ten­tially good step for real world AI al­ign­ment.

  • Cas­par Oester­held (FRI) for his post and pa­per study­ing which de­ci­sion the­o­ries would arise from en­vi­ron­ments like re­in­force­ment learn­ing or futarchy. Cas­par’s an­gle of at­tack is new and leads to in­ter­est­ing re­sults.

We’ll be con­tact­ing each win­ner by email to ar­range trans­fer of money.

We would also like to thank ev­ery­one who par­ti­ci­pated. Even if you didn’t get one of the prizes to­day, please don’t let that dis­cour­age you!

The next round

We are now an­nounc­ing the next round of the AI al­ign­ment prize.

As be­fore, we’re look­ing for tech­ni­cal, philo­soph­i­cal and strate­gic ideas for AI al­ign­ment, posted pub­li­cly be­tween now and March 31, 2018. You can sub­mit your en­tries in the com­ments here or by email to ap­ply@ai-al­ign­ment.com. We may give feed­back on early en­tries to al­low im­prove­ment, though our abil­ity to do this may be­come limited by the vol­ume of en­tries.

The min­i­mum prize pool this time will be $10,000, with a min­i­mum first prize of $5,000. If the en­tries once again sur­pass our ex­pec­ta­tions, we will again in­crease that pool.

Thank you!

(Ad­den­dum: I’ve writ­ten a post sum­ma­riz­ing the typ­i­cal feed­back we’ve sent to par­ti­ci­pants in the pre­vi­ous round.)