AI Alignment Prize: Round 2 due March 31, 2018

Link post

There are still several weeks left to enter the second round of the AI Alignment Prize. The official announcement of the second round, along with the first round’s winners, is here. We’ll be awarding a minimum of $10,000 for advances in technical, philosophical and/​or strategic approaches to AI alignment: Ensuring that future smarter than human intelligence will have goals aligned with the goals of humanity. Last time we got such good entries we tripled our announced prize pool, and we’d love to justify doing that again. You can enter by replying to this post, to the original announcement of the second prize, or by emailing

AI alignment is wide open for both amateur and professional ideas. It’s one of the most important things humanity needs to work on, yet it receives almost no serious attention. Much of the low hanging fruit remains unpicked. We hope to help change that. Even for those who don’t produce new advances or win prizes, more time spent on this problem is time well spent.

Whether or not you’re considering entering, please help get the word out and remind people of the upcoming deadline. Please do repost this notice with a link back to the original (since people won’t be able to enter by replying to the copy). Entries for the second round are running behind the pace from the first round, and second times around risk being less sexy and exciting than the first, despite the increased prize pool. We also didn’t do enough to make it clear the contest was continuing with a larger prize pool.

Paul Christiano is also offering an additional $10,000 prize for the best evidence that his preferred approach to AI alignment, which is reflected in posts at and centered on iterated amplification, is doomed. That one is due on April 20.

Last November, Paul Christiano, Vladimir Slepnev and myself announced the AI Alignment Prize for publicly posted work advancing our understanding of AI alignment. We were hopeful that a prize could be effective at motivating good work and the sharing of good work, but worried we wouldn’t get quality entries.

We need not have worried. On all fronts, it was a smashing success, exceeding all of our expectations for both quantity and quality. We got winning entries both from employees of organizations like FRI and MIRI, and from independent bloggers. Our original prize pool was $5,000 total, but we were so happy that we decided to award $15,000 total, including $5,000 to the first prize winner, Scott Garrabant. Scott has gone on to write many more excellent posts, and has himself observed that were it not for the prize, his piece on Goodheart Taxonomy and many other pieces would likely have remained in his draft folder.

The other winners were Tobias Baumann for Using Surrogate Goals to Deflect Threats, Vadim Kosoy for Delegative Reinforcement Learning (1, 2, 3), John Maxwell for Friendly AI through Ontology Autogeneration, Alex Mennen for his posts on legibility to other agents and learning goals of simple agents, and Caspar Oesterheld for his post and paper studying which decision theories would arise from environments like reinforcement learning or futarchy.

It is easy to default to thinking that scientific progress comes from Official Scientific Papers published in Official Journals, whereas blog posts are people having fun or hashing out ideas. Not so! Many of the best ideas come from people writing on their own, on their own platforms. Encouraging them to take that seriously, and others to take them seriously in turn when they deserve it, and raising the prestige and status of such efforts, is one of our key goals. The winning entry from the first contest was a blog post breaking down a well-known concept, Goodhart’s Law, and giving us a better way to talk about it. Often the most important advances are simple things, done well.

Here are the rules of the contest, which are unchanged except for which posts to reply to, larger numbers and new dates:


Your entry must be published online for the first time between January 15 and March 31, 2018, and contain novel ideas about AI alignment. Entries have no minimum or maximum size. Important ideas can be short!

Your entry must be written by you, and submitted before 9pm Pacific Time on March 31, 2018. Submit your entries either as links in the comments to the original announcement, this post, or by email to We may provide feedback on early entries to allow improvement.

We will award a minimum of $10,000 to the winners. The first place winner will get at least $5000. The second place winner will get at least $2000. Other winners will get at least $1000.

Entries will be judged subjectively. Final judgment will be by Paul Christiano. Prizes will be awarded on or before April 15, 2018.

What kind of work are we looking for?

AI Alignment focuses on ways to ensure that future smarter than human intelligence will have goals aligned with the goals of humanity. Many approaches to AI Alignment deserve attention. This includes technical and philosophical topics, as well as strategic research about related social, economic or political issues. A non-exhaustive list of technical and other topics can be found here.

We are not interested in research dealing with the dangers of existing machine learning systems commonly called AI that do not have smarter than human intelligence. These concerns are also understudied, but are not the subject of this prize except in the context of future smarter than human intelligence. We are also not interested in general AI research. We care about AI alignment, which may or may not also advance the cause of general AI research.

For further guidance on what we’re looking for, and because they’re worth reading and thinking about, check out the winners of the first round, which we linked to above.