# Announcing the AI Alignment Prize

Stronger than hu­man ar­tifi­cial in­tel­li­gence would be dan­ger­ous to hu­man­ity. It is vi­tal any such in­tel­li­gence’s goals are al­igned with hu­man­ity’s goals. Max­i­miz­ing the chance that this hap­pens is a difficult, im­por­tant and un­der-stud­ied prob­lem.

To en­courage more and bet­ter work on this im­por­tant prob­lem, we (Zvi Mow­show­itz and Vladimir Slep­nev) are an­nounc­ing a $5000 prize for pub­li­cly posted work ad­vanc­ing un­der­stand­ing of AI al­ign­ment, funded by Paul Chris­ti­ano. This prize will be awarded based on en­tries gath­ered over the next two months. If the prize is suc­cess­ful, we will award fur­ther prizes in the fu­ture. The prize is not backed by or af­fili­ated with any or­ga­ni­za­tion. ## Rules Your en­try must be pub­lished on­line for the first time be­tween Novem­ber 3 and De­cem­ber 31, 2017, and con­tain novel ideas about AI al­ign­ment. En­tries have no min­i­mum or max­i­mum size. Im­por­tant ideas can be short! Your en­try must be writ­ten by you, and sub­mit­ted be­fore 9pm Pa­cific Time on De­cem­ber 31, 2017. Sub­mit your en­tries ei­ther as links in the com­ments to this post, or by email to ap­ply@ai-al­ign­ment.com. We may provide feed­back on early en­tries to al­low im­prove­ment. We will award$5000 to be­tween one and five win­ners. The first place win­ner will get at least $2500. The sec­ond place win­ner will get at least$1000. Other win­ners will get at least \$500.

En­tries will be judged sub­jec­tively. Fi­nal judg­ment will be by Paul Chris­ti­ano. Prizes will be awarded on or be­fore Jan­uary 15, 2018.

## What kind of work are we look­ing for?

AI Align­ment fo­cuses on ways to en­sure that fu­ture smarter than hu­man in­tel­li­gence will have goals al­igned with the goals of hu­man­ity. Many ap­proaches to AI Align­ment de­serve at­ten­tion. This in­cludes tech­ni­cal and philo­soph­i­cal top­ics, as well as strate­gic re­search about re­lated so­cial, eco­nomic or poli­ti­cal is­sues. A non-ex­haus­tive list of tech­ni­cal and other top­ics can be found here.

We are not in­ter­ested in re­search deal­ing with the dan­gers of ex­ist­ing ma­chine learn­ing sys­tems com­monly called AI that do not have smarter than hu­man in­tel­li­gence. Th­ese con­cerns are also un­der­stud­ied, but are not the sub­ject of this prize ex­cept in the con­text of fu­ture smarter than hu­man in­tel­li­gence. We are also not in­ter­ested in gen­eral AI re­search. We care about AI al­ign­ment, which may or may not also ad­vance the cause of gen­eral AI re­search.

• Here is my sub­mis­sion.

Thank you for mo­ti­vat­ing me to write this blog post I have been putting off for a while.

Dis­claimer: If you want to only mea­sure the con­tri­bu­tion that came Novem­ber or later, com­pare to this post, which has one fewer cat­e­gory, no names, fewer ex­am­ples, noth­ing about miti­ga­tion, and worse pre­sen­ta­tion.

I think this is an im­por­tant idea, so I ap­pre­ci­ate feed­back, es­pe­cially about pre­sen­ta­tion.

• Ac­knowl­edged, and thank you! I think that’s a great post. If you want to make it bet­ter, the eas­iest way is by adding sim­ple real world ex­am­ples for each sec­tion.

• Another Dis­claimer: The out­line at the be­gin­ning was added af­ter the dead­line, thanks to Rae­mon and other peo­ple who pro­vided ex­am­ples.

• Should I sub­mit? Work­ing on this is my job, so it’s maybe bet­ter to en­courage oth­ers to come on board?

• I saw a talk ear­lier this year that men­tioned this 2015 Cor­rigi­bil­ity pa­per as a good start­ing point for some­one new to al­ign­ment re­search. If that’s still true, I started writ­ing up some thoughts on a pos­si­ble gen­er­al­iza­tion of the method in that pa­per.

Any­way, sub­mit­ting this draft early to hope­fully get some feed­back whether I’m on the right track:

The new ver­sion does bet­ter on sub-agent shut­down and elimi­nates the “man­ag­ing the news” prob­lem.

(Let me know if some­one already thought of this ap­proach!)

EDIT 2017-11-09: filled in the sec­tion on the -ac­tion model.

• You don’t men­tion de­ci­sion the­ory in your list of top­ics, but I guess it doesn’t hurt to try.

I have thought a bit about what one might call the “im­ple­men­ta­tion prob­lem of de­ci­sion the­ory”. Let’s say you be­lieve that some the­ory of ra­tio­nal de­ci­sion mak­ing, e.g., ev­i­den­tial or up­date­less de­ci­sion the­ory, is the right one for an AI to use. How would you de­sign an AI to be­have in ac­cor­dance with such a nor­ma­tive the­ory? Con­versely, if you just go ahead and build a sys­tem in some ex­ist­ing frame­work, how would that AI be­have in New­comb-like prob­lems?

There are two pieces that I up­loaded/​finished on this topic in Novem­ber and De­cem­ber. The first is a blog post not­ing that futarchy-type ar­chi­tec­tures would, per de­fault, im­ple­ment ev­i­den­tial de­ci­sion the­ory. The sec­ond is a draft ti­tled “Ap­proval-di­rected agency and the de­ci­sion the­ory of New­comb-like prob­lems”.

For any­one who’s in­ter­ested in this topic, here are some other re­lated pa­pers and blog posts:

So far, my re­search and the pa­pers by oth­ers I linked have fo­cused on clas­sic New­comb-like prob­lems. One could also dis­cuss how ex­ist­ing AI paradigms re­lated to other is­sues of nat­u­ral­ized agency, in par­tic­u­lar self-lo­cat­ing be­liefs and nat­u­ral­ized in­duc­tion, though here it seems more as though ex­ist­ing frame­works just lead to re­ally messy be­hav­ior.

Send com­ments to first­nameDOTlast­nameATfoun­da­tional-re­searchDOTorg. (Of course, you can also com­ment here or send you a LW PM.)

• What should we be do­ing to help get more peo­ple to en­ter, whether by spread­ing the word or an­other way? We want this to work and re­sult in good things, and it’s iter­a­tion one so doubtless a lot we’re not do­ing right.

• Yeah, I had an ini­tial gut sense of “oh man this seems im­por­tant and but I’m wor­ried it’d quietly fade out of con­scious­ness by de­fault.” Much of my ad­vice would be wh­pear­son’s. Some ad­di­tional thoughts (I think mostly flesh­ing out why I think wh­pear­son’s sug­ges­tions are im­por­tant)

i. Big Ac­ti­va­tion Costs

You are ask­ing peo­ple to do a hard thing. You’d pro­vid­ing money to in­cen­tivize them, but peo­ple are lazy—they will for­get, or start do­ing it but not get around to finish or not get around to finish­ing un­til too late.

Any­thing to re­duce the ac­ti­va­tion cost is good.

1) Maybe have the first thing you ask is for peo­ple to ap­ply if they might be in­ter­ested, with as low a cost to do­ing so as pos­si­ble (while gain­ing at least some in­for­ma­tion about peo­ple and weed­ing out dead-wood).

This gets peo­ple slightly com­mit­ted, and gives you the op­por­tu­nity to spam a much nar­rower sub­set of peo­ple to re­mind them. (see spam sec­tion)

2) It’s am­bigu­ous to me what kind of writ­ing you’re look­ing for, which in turn makes me un­sure if it’s be a good use of my time to work on this, which makes me hes­i­tate. (I’m cur­rently as­sum­ing that this is not the right use of my tal­ents both for al­tru­is­tic and self­ish rea­sons, but I can imag­ine a slightly differ­ent ver­sion of me for whom it’d be am­bigu­ous)

Wh­pear­son’s “list good ex­ist­ing ar­ti­cles, as di­verse as pos­si­ble” helps coun­ter­act part of this, but still doesn’t an­swer ques­tions like “should I be do­ing this if this is cur­rently my day job? Pre­sum­ably the point is to get more peo­ple workin on this.” (and the cor­re­lary: if pro­fes­sional AI safety work­ers are sub­mit­ting, what chance do I have of con­tribut­ing some­thing use­ful?)

(Re­lat­edly—I’d origi­nally thought you should spell out what sort of ques­tions you were look­ing to re­solve, then saw you had linked to Paul Chris­ti­ano’s doc. I think at­tempt­ing to sum­ma­rize the doc might ac­ci­den­tally fo­cus on too nar­row a do­main, but the cur­rent link­ing of the doc is so small I missed it the first time)

ii. Spam vs Valuable-Self-Pro­mot­ing-Machinery

By de­fault, you need to spam things a lot. One way to get the word out is to post on all the rele­vant FB groups, dis­cords, etc—mul­ti­ple times, so that when they for­get and fade to the back­burner it doesn’t dis­ap­pear for­ever.

Be­ing forced to spam ev­ery­one once a week is a bad equil­ibrium. If you can figure out how to spam ex­actly the peo­ple who mat­ter (see i.1) that’s also bet­ter.

If you can spam in a way that’s pro­vid­ing value rather than suck­ing up at­ten­tion, that’s bet­ter. If you can make the thing spam it­self in a way that pro­vides value, bet­ter still.

One way of spam­ming-that-pro­vides value might be hav­ing a cou­ple fol­lowup posts that do things like “provide sug­ges­tions and read­ing lists for peo­ple who are con­sid­er­ing work­ing on this but don’t quite know how to ap­proach the prob­lem.” (tar­get­ing the sort of per­son who you think al­most has the skills the con­tribute, and is just miss­ing a few key el­e­ments that are easy to teach)

Another might be en­courag­ing to post their drafts pub­li­cly to at­tract ad­di­tional at­ten­tion and com­ments that keep the thing in pub­lic con­scious­ness. (This may work against the con­test model though)

• Some ran­dom ini­tial thoughts.

Post on the SSC open thread ? Or the EA fo­rum open thread (maybe the EA sub­red­dit too). I’ve seen it posted to the con­trol prob­lem red­dit.

I’ll post it on the ai dan­mark safety face­book page, al­though I’ve never man­aged to go to one of their read­ing groups (it is now pend­ing).

Ask nicely the peo­ple run­ning lesser­wrong to see if you can see the refer­rer for where traf­fic comes in to this thread, this will give you an idea where most of the traf­fic comes from.

To get more peo­ple to en­ter, imag­ine you were run­ning the com­pe­ti­tion pre­vi­ously, pick N ar­ti­cles out there on the in­ter­net and link them as things that would be short listed. This would give peo­ple an idea of what you are look­ing for. Try and pick a di­verse range else you might get ar­ti­cles in a cluster.

Per­haps think about try­ing to get some pub­lic­ity to sweeten the deal, e.g. the win­ner also gets fea­tured in X pres­ti­gious place (if the sub­mit­ter wants it to be). Although maybe only af­ter the qual­ity has been shown to be high enough, af­ter the first cou­ple of iter­a­tions.

• Put it in all rele­vant face­book groups?

• I think this is the most im­por­tant thing, and I would be happy to help with that.

• I don’t know if this is a use­ful “soft” sub­mis­sion, con­sid­er­ing I am still read­ing and learn­ing in the area.

But I think the cur­rent metaphors (pa­per­clips, etc.) are not very per­sua­sive for con­vinc­ing folks in the world at large that value al­ign­ment is a BIG, HARD PROBLEM. Here is my at­tempt to add a pos­si­bly-new metaphor to the mix: https://​​nilscript.word­press.com/​​2017/​​11/​​26/​​par­ent­ing-al­ign­ment-prob­lem/​​

• Con­tact: defec­tivealtru­ist at g mail

• OK, I went on a rant and re­vived my blog af­ter 4 years of in­ac­tivity be­cause en­tries aren’t sup­posed to be en­tered as com­ments but are sup­posed to be linked to in­stead.

https://​​poly­math­blog­ger.word­press.com/​​2017/​​11/​​12/​​ac­knowl­edge-the-elephant-en­try-for-ai-al­ign­ment-prize/​​

• Just com­ment on the blog or here or both, if you want to send pri­vate feed­back try JoeShip­man a-with-a-cir­cle-around-it aol end-of-sen­tence-punc­tu­a­tion com

• Are you look­ing for en­tries with ac­tion­able in­for­ma­tion, or would you be in­ter­ested in a pa­per show­ing, for ex­am­ple, that AI al­ign­ment might not be as big a prob­lem as we thought but not for a rea­son that will help us solve the AI al­ign­ment prob­lem?

• Yes, a pa­per like that could qual­ify.

• Posted on my blog, but might as well link it here. Not of the qual­ity that Paul Chris­ti­ano seeks, but might be of some in­ter­est, though many of the same point points have been dis­cussed over and over here and el­se­where be­fore.

• You should think about the in­cen­tives of post­ing early in the 2 month win­dow rather than late. Later en­tries will be in­fluenced by ear­lier en­tries so you have a mis­al­ign­ment be­tween want­ing to win the prize and want­ing to ad­vance the con­ver­sa­tion sooner. Chris­ti­ano ought to an­nounce that if one en­try builds in a valuable way on an ear­lier en­try by some­one else, the ear­lier sub­mit­ter will also gain sub­jec­tive judgy-points in a way that he, Paul, af­firms is cal­ibrated nei­ther to pe­nal­ize early en­try nor to dis­cour­age work that builds on ear­lier en­tries.

• Here’s my en­try. I think it’s what you want… Hosted on DocDroid.

http://​​doc­dro.id/​​bUVo61P

• I was go­ing to add an­other sec­tion to the above re­port with di­a­grams and ex­pla­na­tions but I wouldn’t get to finish it like I wanted to in time. But if you want the ba­sic di­a­gram with no ex­pla­na­tions to un­der­stand it bet­ter I just up­loaded the ba­sic flowchart.

Just ap­ply the doc­u­ment sec­tions to the parts.

• Just sent an Email to the con­test Email listed at the top. I as­sume that is fine.

Happy New Years Every­one!

• Hello :) I’ve cre­ated this as a frame­work for guid­ing our fu­ture with AI http://​​peri­do­tai.com/​​call-to-artists/​​ AND to boot­strap in­ter­est in my art and thoughts here at https://​​quan­tum­sym­bol.com

• Is it pos­si­ble to en­ter the con­test as a group? Mean­ing, can the ar­ti­cle writ­ten for the con­test have sev­eral coau­thors?

• Yes!

• Are teams al­lowed to make sub­mis­sions?

• Yes.

• Are there any limi­ta­tions on num­ber of sub­mis­sions per per­son (where each sub­mis­sion is a dis­tinct idea)? On num­ber of wins per per­son?

• One win per per­son, and it’s okay to have many ideas. Might be more con­ve­nient if you sub­mit them as one pack­age though.

• Should’ve saved my dec­sion al­ign­ment loop post a few days. Maybe an ex­pan­sion of it? Hmm.

• Yes, an ex­pan­sion of that post would qual­ify.

• How much should I try to make it self-con­tained?

• I’d pre­fer a self-con­tained thing. In the ex­treme case (which might not ap­ply to you), an en­try with many links to the au­thor’s pre­vi­ous writ­ings might be hard to judge un­less these writ­ings are already well known.

• Here’s my en­try: Friendly AI through On­tol­ogy Au­to­gen­er­a­tion. Am I al­lowed to keep mak­ing im­prove­ments to it even af­ter the dead­line has passed? (Do­ing so at my own risk, i.e. if it so hap­pens that you’ve already read & judged my es­say be­fore I make my im­prove­ments, and my im­prove­ments aren’t go­ing to af­fect my chances of win­ning, that’s my prob­lem.)

• Can you make a snap­shot frozen at the mo­ment of dead­line and give us a URL to it? That would be the most fair de­ci­sion for the other con­tes­tants, I think.

• OK, I won’t make fur­ther mod­ifi­ca­tions to the ver­sion at the URL in my com­ment above.

EDIT: Now that the judg­ing is over, I am mak­ing some mod­ifi­ca­tions, but noth­ing ma­jor.

• Sub­mit­ting this en­try for your con­sid­er­a­tion: https://​​www.lesser­wrong.com/​​posts/​​bkoeQLTB­bod­pqHePd/​​ai-goal-al­ign­ment-en­try-how-to-teach-a-com­puter-to-love. I’ll email it as well. Your com­mit­ment to this call for ideas is much ap­pre­ci­ated!

• I have un­pub­lished text on the topic and will put a draft on­line in the next cou­ple of weeks, and will ap­ply it to the com­pe­ti­tion. I will add URL here when it will be ready.

• My sub­mis­sion is on my pro­ject blog: https://​​airis-ai.com/​​2017/​​12/​​31/​​friendly-ai-via-agency-sus­tain­ment/​​

Thank you for host­ing this ex­cel­lent com­pe­ti­tion! It was very in­spiring. This is an idea I’ve been bounc­ing around in the back of my mind for sev­eral months now, and it is your com­pe­ti­tion that prompted me to re­fine it, flesh it out, and put it to pa­per.

My con­tact email is berick­cook@gmail.com

https://​​medium.com/​​three­laws/​​mak­ing-ai-less-dan­ger­ous-2742e29797bd

I would like to pro­pose a cer­tain kind of AI goal struc­tures that would be an al­ter­na­tive to util­ity max­imi­sa­tion based goal struc­tures. The pro­posed al­ter­na­tive frame­work would make AI sig­nifi­cantly safer, though it would not guaran­tee to­tal safety. It can be used at strong AI level and also much be­low, so it is well scal­able. The main idea would be to re­place util­ity max­imi­sa­tion with the con­cept of home­osta­sis.

• Hi cousin_it

I emailed a pdf to the com­pe­ti­tion ad­dress so hope­fully you can ac­cess my email there.

If not, please let me know the best way to send to you with­out post­ing it­pub­li­cally.

Thanks

• Any win­ners?

• My en­try:

Rais­ing Mo­ral AI

Is it eas­ier to teach a robot to stay safe by not tear­ing off its own limbs and not drilling holes in its head and not touch­ing lava and not fal­ling from a cliff and so on ad in­fini­tum, or in­tro­duce pain as in­escapable con­se­quence of such ac­tions and let robot ex­per­i­ment and learn?

Similarly, while try­ing to cre­ate a safe AGI, it is fu­tile to make ex­haus­tive and non-con­tra­dic­tory set of rules (val­ues, poli­cies, laws, com­mit­tees) due to in­finite com­plex­ity. A pow­er­ful AGI agent might find an ex­cep­tion or con­flict in rules and be­come dan­ger­ous in­stantly. A bet­ter ap­proach would be to let AGI to go through ex­pe­riences similar to those hu­mans went through.

1) Create a vir­tual world similar to our own and fill it with AGI agents with in­tel­li­gence com­pa­rable to cur­rent hu­mans. It would be prefer­able if agents did not even know their na­ture and how they are made, to avoid in­tel­li­gence ex­plo­sion.

2) Choose agents that are the most safe to other agents (and hu­mans) by ob­serv­ing and an­a­lyz­ing their be­hav­ior over long pe­ri­ods of time. This is the most crit­i­cal step. Since agents will com­mu­ni­cate with other agents while liv­ing in that world, liv­ing through good and bad events, through suffer­ing, losses, and hap­piness, they will learn what is good and what is bad and can “be­come good” nat­u­rally. Then we need to choose the best of them. Some­one on the level of Gandhi.

3) Bring the best AGI agents to our world.

4) It is not out of the ques­tion that our world is ac­tu­ally a com­puter simu­la­tion and our civ­i­liza­tion is ac­tu­ally a test­ing ground for such AGI agents. After “death”, the best in­di­vi­d­u­als are trans­ferred to “heaven” (real world). It would also ex­plain Fermi’s para­dox—no­body is out there be­cause for the pur­poses of test­ing AGI there is no rea­son to simu­late other civ­i­liza­tions in our uni­verse.

If there are good peo­ple who don’t hurt other peo­ple need­lessly, it’s not be­cause there is a set of rules in them or list of val­ues. Rules and val­ues are mostly emerg­ing prop­er­ties, based on mem­o­ries and ex­pe­riences. Me­mories of be­ing hurt, ex­pe­riences of sad­ness and loss

and love and de­spair. It is an essence, an amalga­ma­tion, of a whole life’s ex­pe­riences. Values can be for­mu­lated and de­duced, but they can­not be trans­ferred into a new AGI en­tity with­out ac­tual mem­o­ries. Good AGI must be raised and nur­tured, not con­structed from cold rules.

There is no need to re­peat the whole pro­cess of hu­man civ­i­liza­tion de­vel­op­ment. Some short­cuts are pos­si­ble (and nec­es­sary) for many rea­sons. One be­ing the non-biolog­i­cal na­ture of AGI, where hard cod­ing makes de­vel­op­ment and up­grades eas­ier and his­tory run­ning much faster. But im­ple­ment­ing the ma­jor­ity of hu­man qual­ities can­not be avoided, oth­er­wise AGI will be too alien to hu­man val­ues and there­fore again dan­ger­ous.

• I don’t see why this has been down­voted so many times. It is likely to be the only way of en­sur­ing the value-al­ign­ment we seek. It is based on an­cient wis­dom (cut the trees that bear bad fruit) and pri­ori­tizes safety by cor­don­ing off AGI agents.