Announcing the AI Alignment Prize

cousin_it3 Nov 2017 15:47 UTC

95 points

Bounties (closed)AI 2017-2019 AI Alignment Prize

Stronger than human artificial intelligence would be dangerous to humanity. It is vital any such intelligence’s goals are aligned with humanity’s goals. Maximizing the chance that this happens is a difficult, important and under-studied problem.

To encourage more and better work on this important problem, we (Zvi Mowshowitz and Vladimir Slepnev) are announcing a $5000 prize for publicly posted work advancing understanding of AI alignment, funded by Paul Christiano.

This prize will be awarded based on entries gathered over the next two months. If the prize is successful, we will award further prizes in the future.

The prize is not backed by or affiliated with any organization.

Rules

Your entry must be published online for the first time between November 3 and December 31, 2017, and contain novel ideas about AI alignment. Entries have no minimum or maximum size. Important ideas can be short!

Your entry must be written by you, and submitted before 9pm Pacific Time on December 31, 2017. Submit your entries either as links in the comments to this post, or by email to apply@ai-alignment.com. We may provide feedback on early entries to allow improvement.

We will award $5000 to between one and five winners. The first place winner will get at least $2500. The second place winner will get at least $1000. Other winners will get at least $500.

Entries will be judged subjectively. Final judgment will be by Paul Christiano. Prizes will be awarded on or before January 15, 2018.

What kind of work are we looking for?

AI Alignment focuses on ways to ensure that future smarter than human intelligence will have goals aligned with the goals of humanity. Many approaches to AI Alignment deserve attention. This includes technical and philosophical topics, as well as strategic research about related social, economic or political issues. A non-exhaustive list of technical and other topics can be found here.

We are not interested in research dealing with the dangers of existing machine learning systems commonly called AI that do not have smarter than human intelligence. These concerns are also understudied, but are not the subject of this prize except in the context of future smarter than human intelligence. We are also not interested in general AI research. We care about AI alignment, which may or may not also advance the cause of general AI research.

(Addendum: the results of the prize and the rules for the next round have now been announced.)

What links here?

cousin_it3 Nov 2017 15:47 UTC

95 points

78 comments1 min readLW link

Bounties (closed)AI 2017-2019 AI Alignment Prize

Scott Garrabrant 30 Dec 2017 16:49 UTC
6 points
Here is my submission.
Thank you for motivating me to write this blog post I have been putting off for a while.
Disclaimer: If you want to only measure the contribution that came November or later, compare to this post, which has one fewer category, no names, fewer examples, nothing about mitigation, and worse presentation.
I think this is an important idea, so I appreciate feedback, especially about presentation.
- cousin_it 31 Dec 2017 12:51 UTC
  4 points
  Parent
  Acknowledged, and thank you! I think that’s a great post. If you want to make it better, the easiest way is by adding simple real world examples for each section.
- Scott Garrabrant 8 Jan 2018 15:13 UTC
  1 point
  Parent
  Another Disclaimer: The outline at the beginning was added after the deadline, thanks to Raemon and other people who provided examples.
Caspar Oesterheld 21 Dec 2017 16:10 UTC
6 points
You don’t mention decision theory in your list of topics, but I guess it doesn’t hurt to try.
I have thought a bit about what one might call the “implementation problem of decision theory”. Let’s say you believe that some theory of rational decision making, e.g., evidential or updateless decision theory, is the right one for an AI to use. How would you design an AI to behave in accordance with such a normative theory? Conversely, if you just go ahead and build a system in some existing framework, how would that AI behave in Newcomb-like problems?
There are two pieces that I uploaded/finished on this topic in November and December. The first is a blog post noting that futarchy-type architectures would, per default, implement evidential decision theory. The second is a draft titled “Approval-directed agency and the decision theory of Newcomb-like problems”.
For anyone who’s interested in this topic, here are some other related papers and blog posts:
- Another one I wrote: “Doing what has worked well in the past leads to evidential decision theory” (I updated this in December, but it was first written and uploaded in September, so it doesn’t count for the competition.)
- Albert and Heiner (2001): “An Indirect-Evolution Approach to Newcomb’s Problem”
- Meyer, Feldmaier and Shen (2016): “Reinforcement Learning in Conflicting Environments for Autonomous Vehicles”
So far, my research and the papers by others I linked have focused on classic Newcomb-like problems. One could also discuss how existing AI paradigms related to other issues of naturalized agency, in particular self-locating beliefs and naturalized induction, though here it seems more as though existing frameworks just lead to really messy behavior.
Send comments to firstnameDOTlastnameATfoundational-researchDOTorg. (Of course, you can also comment here or send you a LW PM.)
- cousin_it 31 Dec 2017 12:56 UTC
  3 points
  Parent
  Caspar, thanks for the amazing entry! Acknowledged.
Ben Pringle 8 Nov 2017 0:27 UTC
6 points
I saw a talk earlier this year that mentioned this 2015 Corrigibility paper as a good starting point for someone new to alignment research. If that’s still true, I started writing up some thoughts on a possible generalization of the method in that paper.
Anyway, submitting this draft early to hopefully get some feedback whether I’m on the right track:
GeneralizedUtilityIndifference_Draft_Latest.pdf (edited)
The new version does better on sub-agent shutdown and eliminates the “managing the news” problem.
(Let me know if someone already thought of this approach!)
EDIT 2017-11-09: filled in the section on the $n$ -action model.
- cousin_it 8 Nov 2017 9:08 UTC
  2 points
  Parent
  Ben, thank you! Sent you an email.
Stuart_Armstrong 5 Nov 2017 19:16 UTC
6 points
Should I submit? Working on this is my job, so it’s maybe better to encourage others to come on board?
- cousin_it 5 Nov 2017 19:45 UTC
  5 points
  Parent
  We certainly don’t want to exclude experts! Please do submit.
Zvi 5 Nov 2017 16:59 UTC
6 points
What should we be doing to help get more people to enter, whether by spreading the word or another way? We want this to work and result in good things, and it’s iteration one so doubtless a lot we’re not doing right.
- whpearson 5 Nov 2017 17:37 UTC
  8 points
  Parent
  Some random initial thoughts.
  Post on the SSC open thread ? Or the EA forum open thread (maybe the EA subreddit too). I’ve seen it posted to the control problem reddit.
  I’ll post it on the ai danmark safety facebook page, although I’ve never managed to go to one of their reading groups (it is now pending).
  Ask nicely the people running lesserwrong to see if you can see the referrer for where traffic comes in to this thread, this will give you an idea where most of the traffic comes from.
  To get more people to enter, imagine you were running the competition previously, pick N articles out there on the internet and link them as things that would be short listed. This would give people an idea of what you are looking for. Try and pick a diverse range else you might get articles in a cluster.
  Perhaps think about trying to get some publicity to sweeten the deal, e.g. the winner also gets featured in X prestigious place (if the submitter wants it to be). Although maybe only after the quality has been shown to be high enough, after the first couple of iterations.
  - habryka 6 Nov 2017 2:23 UTC
    1 point
    Parent
    Happy to give you any analytics data for the page.
    - cousin_it 6 Nov 2017 10:33 UTC
      2 points
      Parent
      Can you send them to Zvi and me? (vladimir.slepnev at gmail, thezvi at gmail)
- Raemon 5 Nov 2017 19:49 UTC
  6 points
  Parent
  Yeah, I had an initial gut sense of “oh man this seems important and but I’m worried it’d quietly fade out of consciousness by default.” Much of my advice would be whpearson’s. Some additional thoughts (I think mostly fleshing out why I think whpearson’s suggestions are important)
  i. Big Activation Costs
  You are asking people to do a hard thing. You’d providing money to incentivize them, but people are lazy—they will forget, or start doing it but not get around to finish or not get around to finishing until too late.
  Anything to reduce the activation cost is good.
  1) Maybe have the first thing you ask is for people to apply if they might be interested, with as low a cost to doing so as possible (while gaining at least some information about people and weeding out dead-wood).
  This gets people slightly committed, and gives you the opportunity to spam a much narrower subset of people to remind them. (see spam section)
  2) It’s ambiguous to me what kind of writing you’re looking for, which in turn makes me unsure if it’s be a good use of my time to work on this, which makes me hesitate. (I’m currently assuming that this is not the right use of my talents both for altruistic and selfish reasons, but I can imagine a slightly different version of me for whom it’d be ambiguous)
  Whpearson’s “list good existing articles, as diverse as possible” helps counteract part of this, but still doesn’t answer questions like “should I be doing this if this is currently my day job? Presumably the point is to get more people workin on this.” (and the correlary: if professional AI safety workers are submitting, what chance do I have of contributing something useful?)
  (Relatedly—I’d originally thought you should spell out what sort of questions you were looking to resolve, then saw you had linked to Paul Christiano’s doc. I think attempting to summarize the doc might accidentally focus on too narrow a domain, but the current linking of the doc is so small I missed it the first time)
  ii. Spam vs Valuable-Self-Promoting-Machinery
  By default, you need to spam things a lot. One way to get the word out is to post on all the relevant FB groups, discords, etc—multiple times, so that when they forget and fade to the backburner it doesn’t disappear forever.
  Being forced to spam everyone once a week is a bad equilibrium. If you can figure out how to spam exactly the people who matter (see i.1) that’s also better.
  If you can spam in a way that’s providing value rather than sucking up attention, that’s better. If you can make the thing spam itself in a way that provides value, better still.
  One way of spamming-that-provides value might be having a couple followup posts that do things like “provide suggestions and reading lists for people who are considering working on this but don’t quite know how to approach the problem.” (targeting the sort of person who you think almost has the skills the contribute, and is just missing a few key elements that are easy to teach)
  Another might be encouraging to post their drafts publicly to attract additional attention and comments that keep the thing in public consciousness. (This may work against the contest model though)
- avturchin 5 Nov 2017 17:34 UTC
  2 points
  Parent
  Put it in all relevant facebook groups?
  - habryka 6 Nov 2017 2:23 UTC
    1 point
    Parent
    I think this is the most important thing, and I would be happy to help with that.
    - cousin_it 6 Nov 2017 10:36 UTC
      2 points
      Parent
      That would be great! Do you need any input from us to do it?
Dr_Manhattan 3 Nov 2017 15:51 UTC
6 points
Zvi/Vladimir, what’s your role in this—are you the judges?
- cousin_it 3 Nov 2017 16:04 UTC
  6 points
  Parent
  Organizing and judging. Might bring in other judges if things go well.
Dan Fitch 9 Dec 2017 4:33 UTC
5 points
I don’t know if this is a useful “soft” submission, considering I am still reading and learning in the area.
But I think the current metaphors (paperclips, etc.) are not very persuasive for convincing folks in the world at large that value alignment is a BIG, HARD PROBLEM. Here is my attempt to add a possibly-new metaphor to the mix: https://nilscript.wordpress.com/2017/11/26/parenting-alignment-problem/
- cousin_it 31 Dec 2017 12:57 UTC
  2 points
  Parent
  Thank you! Acknowledged. Do you have an email address for contact?
shminux 31 Dec 2017 2:06 UTC
4 points
Posted on my blog, but might as well link it here. Not of the quality that Paul Christiano seeks, but might be of some interest, though many of the same point points have been discussed over and over here and elsewhere before.
- Defective Altruist 31 Dec 2017 4:51 UTC
  3 points
  Parent
  Your link is broken, did you mean to link https://edgeofgravity.wordpress.com/2017/12/31/ai-alignment-bubblicious/ ?
- cousin_it 31 Dec 2017 13:00 UTC
  2 points
  Parent
  Thank you! Acknowledged (though your link didn’t work for me, Defective Altruist’s did). Do you have an email address for contact?
  - shminux 7 Jan 2018 8:45 UTC
    2 points
    Parent
    shminux at gmail should work. Thank you for the acknowledgment! Tried to fix the link above, not sure how well it worked.
Defective Altruist 29 Dec 2017 9:43 UTC
4 points
Submission: http://futilitymonster.tumblr.com/post/169068920534/the-true-ai-box
Contact: defectivealtruist at g mail
- cousin_it 31 Dec 2017 12:52 UTC
  2 points
  Parent
  Thank you! Acknowledged.
Daniel Wallis 28 Dec 2017 20:47 UTC
4 points
Is “publishing” on google docs ok? Here’s a link:
https://docs.google.com/document/d/1hIzJNNDfWKwAK-lSgs_w-CYa15b0SjdRNDzmh5jJMMU/edit?usp=sharing
- cousin_it 31 Dec 2017 12:52 UTC
  2 points
  Parent
  Thank you! Acknowledged. Do you have an email address for contact?
  - Daniel Wallis 31 Dec 2017 19:40 UTC
    1 point
    Parent
    daniel.wallis.9000@gmail.com
Joseph Shipman 12 Nov 2017 18:51 UTC
4 points
OK, I went on a rant and revived my blog after 4 years of inactivity because entries aren’t supposed to be entered as comments but are supposed to be linked to instead.
https://polymathblogger.wordpress.com/2017/11/12/acknowledge-the-elephant-entry-for-ai-alignment-prize/
- cousin_it 12 Nov 2017 23:12 UTC
  2 points
  Parent
  Thanks! Can you give your email address so we can send feedback?
  - Joseph Shipman 13 Nov 2017 3:04 UTC
    1 point
    Parent
    Just comment on the blog or here or both, if you want to send private feedback try JoeShipman a-with-a-circle-around-it aol end-of-sentence-punctuation com
Sergej Xarkonnen 6 Nov 2017 14:58 UTC
4 points
my idea: https://docs.google.com/document/d/e/2PACX-1vQ3131oaC2JhxafeR77x3nbuOcPRoxLFI0PQvxcYt6N8IqK-FFV6mcK3CMXeEpZlTxjSmSXpvYYbbq7/pub
- cousin_it 6 Nov 2017 15:12 UTC
  2 points
  Parent
  Thanks! Sent you feedback by email.
jimrandomh 4 Nov 2017 18:43 UTC
4 points
Are there any limitations on number of submissions per person (where each submission is a distinct idea)? On number of wins per person?
- cousin_it 4 Nov 2017 20:11 UTC
  3 points
  Parent
  One win per person, and it’s okay to have many ideas. Might be more convenient if you submit them as one package though.
John_Maxwell 31 Dec 2017 21:24 UTC
3 points
Here’s my entry: Friendly AI through Ontology Autogeneration. Am I allowed to keep making improvements to it even after the deadline has passed? (Doing so at my own risk, i.e. if it so happens that you’ve already read & judged my essay before I make my improvements, and my improvements aren’t going to affect my chances of winning, that’s my problem.)
- cousin_it 31 Dec 2017 23:44 UTC
  2 points
  Parent
  Can you make a snapshot frozen at the moment of deadline and give us a URL to it? That would be the most fair decision for the other contestants, I think.
  - John_Maxwell 1 Jan 2018 5:20 UTC
    2 points
    Parent
    OK, I won’t make further modifications to the version at the URL in my comment above.
    EDIT: Now that the judging is over, I am making some modifications, but nothing major.
Patterns_Everywhere 28 Dec 2017 18:38 UTC
3 points
Here’s my entry. I think it’s what you want… Hosted on DocDroid.
http://docdro.id/bUVo61P
- Patterns_Everywhere 31 Dec 2017 5:04 UTC
  1 point
  Parent
  I was going to add another section to the above report with diagrams and explanations but I wouldn’t get to finish it like I wanted to in time. But if you want the basic diagram with no explanations to understand it better I just uploaded the basic flowchart.
  http://docdro.id/hK8OpYJ
  Just apply the document sections to the parts.
  - cousin_it 31 Dec 2017 12:54 UTC
    3 points
    Parent
    Thank you! Acknowledged. Do you have an email address for contact?
    - Patterns_Everywhere 31 Dec 2017 17:15 UTC
      1 point
      Parent
      Just sent an Email to the contest Email listed at the top. I assume that is fine.
      Happy New Years Everyone!
Jeremy Popejoy 2 Dec 2017 21:50 UTC
3 points
Hello :) I’ve created this as a framework for guiding our future with AI http://peridotai.com/call-to-artists/ AND to bootstrap interest in my art and thoughts here at https://quantumsymbol.com
- cousin_it 31 Dec 2017 12:58 UTC
  2 points
  Parent
  Thank you! Acknowledged.
Joseph Shipman 11 Nov 2017 23:19 UTC
3 points
You should think about the incentives of posting early in the 2 month window rather than late. Later entries will be influenced by earlier entries so you have a misalignment between wanting to win the prize and wanting to advance the conversation sooner. Christiano ought to announce that if one entry builds in a valuable way on an earlier entry by someone else, the earlier submitter will also gain subjective judgy-points in a way that he, Paul, affirms is calibrated neither to penalize early entry nor to discourage work that builds on earlier entries.
Adrià Garriga-alonso 10 Nov 2017 13:23 UTC
3 points
Is it possible to enter the contest as a group? Meaning, can the article written for the contest have several coauthors?
- cousin_it 10 Nov 2017 16:13 UTC
  2 points
  Parent
  Yes!
- whpearson 10 Nov 2017 13:35 UTC
  2 points
  Parent
  Jacob Edelman already asked this question and got an affirmative.
James D Miller 5 Nov 2017 18:10 UTC
3 points
Are you looking for entries with actionable information, or would you be interested in a paper showing, for example, that AI alignment might not be as big a problem as we thought but not for a reason that will help us solve the AI alignment problem?
- cousin_it 5 Nov 2017 18:37 UTC
  4 points
  Parent
  Yes, a paper like that could qualify.
whpearson 3 Nov 2017 16:04 UTC
3 points
Should’ve saved my decsion alignment loop post a few days. Maybe an expansion of it? Hmm.
- cousin_it 3 Nov 2017 16:06 UTC
  4 points
  Parent
  Yes, an expansion of that post would qualify.
  - whpearson 4 Nov 2017 13:16 UTC
    2 points
    Parent
    How much should I try to make it self-contained?
    - cousin_it 4 Nov 2017 17:53 UTC
      2 points
      Parent
      I’d prefer a self-contained thing. In the extreme case (which might not apply to you), an entry with many links to the author’s previous writings might be hard to judge unless these writings are already well known.
      - whpearson 12 Nov 2017 18:59 UTC
        2 points
        Parent
        Here is a draft
ExCeph 1 Jan 2018 3:41 UTC
2 points
Submitting this entry for your consideration: https://www.lesserwrong.com/posts/bkoeQLTBbodpqHePd/ai-goal-alignment-entry-how-to-teach-a-computer-to-love. I’ll email it as well. Your commitment to this call for ideas is much appreciated!
Berick Cook 1 Jan 2018 1:58 UTC
2 points
My submission is on my project blog: https://airis-ai.com/2017/12/31/friendly-ai-via-agency-sustainment/
Thank you for hosting this excellent competition! It was very inspiring. This is an idea I’ve been bouncing around in the back of my mind for several months now, and it is your competition that prompted me to refine it, flesh it out, and put it to paper.
My contact email is berickcook@gmail.com
- cousin_it 1 Jan 2018 11:42 UTC
  3 points
  Parent
  Acknowledged, thank you!
interstice 31 Dec 2017 20:18 UTC
2 points
Submission:
https://www.lesserwrong.com/posts/ytph8t6AcxPcmJtDh/formal-models-of-complexity-and-evolution
- cousin_it 1 Jan 2018 0:05 UTC
  2 points
  Parent
  Received, thank you! Can you give a contact email address?
  - interstice 3 Jan 2018 22:32 UTC
    1 point
    Parent
    usernameneeded@gmail.com
    Hope it’s not too late, but I also meant for this post(linked in original) to be part of my entry:
    https://www.lesserwrong.com/posts/ra4yAMf8NJSzR9syB/a-candidate-complexity-measure
Roland Pihlakas 31 Dec 2017 18:09 UTC
2 points
Hello! My newest proposal:
https://medium.com/threelaws/making-ai-less-dangerous-2742e29797bd
I would like to propose a certain kind of AI goal structures that would be an alternative to utility maximisation based goal structures. The proposed alternative framework would make AI significantly safer, though it would not guarantee total safety. It can be used at strong AI level and also much below, so it is well scalable. The main idea would be to replace utility maximisation with the concept of homeostasis.
- cousin_it 1 Jan 2018 0:07 UTC
  3 points
  Parent
  Acknowledged, thank you!
Glen Slade 31 Dec 2017 12:42 UTC
2 points
Hi all. I have posted my competition entry on my blog here:
https://glenslade.wordpress.com/2017/12/31/the-role-of-learning-and-responsibility-in-ai-alignment/
Happy holidays!
- cousin_it 31 Dec 2017 13:01 UTC
  2 points
  Parent
  Thank you for the entry! Do you have an email address for contact?
  - Glen Slade 31 Dec 2017 13:51 UTC
    1 point
    Parent
    Hi cousin_it
    I emailed a pdf to the competition address so hopefully you can access my email there.
    If not, please let me know the best way to send to you without posting itpublically.
    Thanks
alexsalt 8 Nov 2017 16:14 UTC
2 points
My entry:
Raising Moral AI
Is it easier to teach a robot to stay safe by not tearing off its own limbs and not drilling holes in its head and not touching lava and not falling from a cliff and so on ad infinitum, or introduce pain as inescapable consequence of such actions and let robot experiment and learn?
Similarly, while trying to create a safe AGI, it is futile to make exhaustive and non-contradictory set of rules (values, policies, laws, committees) due to infinite complexity. A powerful AGI agent might find an exception or conflict in rules and become dangerous instantly. A better approach would be to let AGI to go through experiences similar to those humans went through.
1) Create a virtual world similar to our own and fill it with AGI agents with intelligence comparable to current humans. It would be preferable if agents did not even know their nature and how they are made, to avoid intelligence explosion.
2) Choose agents that are the most safe to other agents (and humans) by observing and analyzing their behavior over long periods of time. This is the most critical step. Since agents will communicate with other agents while living in that world, living through good and bad events, through suffering, losses, and happiness, they will learn what is good and what is bad and can “become good” naturally. Then we need to choose the best of them. Someone on the level of Gandhi.
3) Bring the best AGI agents to our world.
4) It is not out of the question that our world is actually a computer simulation and our civilization is actually a testing ground for such AGI agents. After “death”, the best individuals are transferred to “heaven” (real world). It would also explain Fermi’s paradox—nobody is out there because for the purposes of testing AGI there is no reason to simulate other civilizations in our universe.
If there are good people who don’t hurt other people needlessly, it’s not because there is a set of rules in them or list of values. Rules and values are mostly emerging properties, based on memories and experiences. Memories of being hurt, experiences of sadness and loss
and love and despair. It is an essence, an amalgamation, of a whole life’s experiences. Values can be formulated and deduced, but they cannot be transferred into a new AGI entity without actual memories. Good AGI must be raised and nurtured, not constructed from cold rules.
There is no need to repeat the whole process of human civilization development. Some shortcuts are possible (and necessary) for many reasons. One being the non-biological nature of AGI, where hard coding makes development and upgrades easier and history running much faster. But implementing the majority of human qualities cannot be avoided, otherwise AGI will be too alien to human values and therefore again dangerous.
- Justus Eapen 16 Nov 2017 19:33 UTC
  0 points
  Parent
  I don’t see why this has been downvoted so many times. It is likely to be the only way of ensuring the value-alignment we seek. It is based on ancient wisdom (cut the trees that bear bad fruit) and prioritizes safety by cordoning off AGI agents.
Jacob Edelman 6 Nov 2017 1:56 UTC
2 points
Are teams allowed to make submissions?
- cousin_it 6 Nov 2017 9:25 UTC
  2 points
  Parent
  Yes.
avturchin 4 Nov 2017 11:06 UTC
2 points
I have unpublished text on the topic and will put a draft online in the next couple of weeks, and will apply it to the competition. I will add URL here when it will be ready.
- cousin_it 4 Nov 2017 11:37 UTC
  4 points
  Parent
  Good. If you submit it early we’ll give feedback so you can improve it before the deadline if needed.
  - avturchin 25 Nov 2017 11:38 UTC
    1 point
    Parent
    https://www.lesserwrong.com/posts/CDWsjQr8KDuj69fTJ/message-to-any-future-ai-there-are-several-instrumental
    This is my early submission and I hope to get useful feedback.
  - avturchin 4 Nov 2017 12:07 UTC
    1 point
    Parent
    Thanks! I will reread it and submit soon.
alexsalt 16 Jan 2018 16:40 UTC
1 point
Any winners?
- Gyrodiot 16 Jan 2018 17:35 UTC
  1 point
  Parent
  Winners have just been announced here.