The Alignment Newsletter #4: 04/​30/​18

Highlights

Rep­tile: A Scal­able Meta-Learn­ing Al­gorithm (Alex Ni­chol et al): I some­how for­got to in­clude this in past emails, so I’m in­clud­ing it now. Rep­tile is an al­gorithm for meta-learn­ing, and in this pa­per is ap­plied to few-shot clas­sifi­ca­tion, where given a few ex­am­ples of differ­ent classes, you must learn a good clas­sifi­ca­tion al­gorithm for those classes. The au­thors show us­ing a Tay­lor ex­pan­sion that MAML and Rep­tile have very similar gra­di­ents to first or­der in alpha, the step size. Their eval­u­a­tion shows that for the few-shot clas­sifi­ca­tion case, Rep­tile and MAML perform similarly (though they do not eval­u­ate on re­in­force­ment learn­ing tasks, as in the MAML pa­per).

My opinion: This seems like an im­por­tant ad­vance in meta-learn­ing, as it is much more com­pu­ta­tion­ally effi­cient than MAML while still achiev­ing similar lev­els of perfor­mance.

Read more: Model-Ag­nos­tic Meta-Learn­ing for Fast Adap­ta­tion of Deep Networks

Tech­ni­cal AI alignment

Tech­ni­cal agen­das and prioritization

In­verse Re­in­force­ment Learn­ing and In­fer­ring Hu­man Prefer­ence with Dy­lan Had­field-Menell (Lu­cas Perry and Dy­lan Had­field-Menell): A few weeks ago, Lu­cas Perry in­ter­viewed Dy­lan Had­field-Menell on the FLI pod­cast about his re­search (which in­cludes pa­pers like Co­op­er­a­tive In­verse Re­in­force­ment Learn­ing, The Off-Switch Game, and In­verse Re­ward De­sign). They dis­cussed a va­ri­ety of top­ics in­clud­ing the mo­ti­va­tions be­hind Dy­lan’s re­search, fu­ture di­rec­tions, thoughts on hard prob­lems such as cor­rigi­bil­ity and prefer­ence ag­gre­ga­tion, etc.

My opinion: This is prob­a­bly most use­ful for un­der­stand­ing the mo­ti­va­tions be­hind many of Dy­lan’s pa­pers and how they all tie into each other, which can be hard to glean just from read­ing the pa­pers. There were also a lot of fram­ings of prob­lems that felt use­ful to me that I haven’t seen el­se­where.

Learn­ing hu­man intent

Zero-Shot Vi­sual Imi­ta­tion (Deepak Pathak, Parsa Mah­moudieh et al)

Re­ward learn­ing theory

Re­ward func­tion learn­ing: the value func­tion and Re­ward func­tion learn­ing: the learn­ing pro­cess (Stu­art Arm­strong): Th­ese posts in­tro­duce a the­o­ret­i­cal frame­work for re­ward learn­ing, where a re­ward learn­ing al­gorithm is mod­eled as some­thing that pro­duces a prob­a­bil­ity dis­tri­bu­tion over re­ward func­tions given a his­tory and cur­rent policy. With such a gen­eral no­tion of re­ward learn­ing, it be­comes hard to define the value func­tion—while we still want some­thing like sum of ex­pected re­wards, it is no longer clear how to take an ex­pec­ta­tion over the re­ward func­tion, given that the dis­tri­bu­tion over it can change over time. Most plau­si­ble ways of do­ing this lead to time-in­con­sis­tent de­ci­sions, but one works well. The sec­ond post turns to the learn­ing pro­cess and an­a­lyzes prop­er­ties that it would be nice to have. In the worst case, we can get quite patholog­i­cal be­hav­ior, but of course we get to choose the learn­ing al­gorithm so we can avoid worst-case be­hav­ior. In gen­eral, we would want our learn­ing al­gorithm to be un­rig­gable and/​or un­in­fluence­able, but this is not pos­si­ble when learn­ing from hu­mans since differ­ent poli­cies on the AI’s part will lead to it learn­ing differ­ent re­wards.

My opinion: I like this the­o­ret­i­cal anal­y­sis that shows what could go wrong with pro­cesses that learn prefer­ences. I did find it a bit hard to con­nect the ideas in this post with con­crete re­ward learn­ing al­gorithms (such as in­verse re­in­force­ment learn­ing) -- it seems plau­si­ble to me that if I prop­erly un­der­stood what the for­mal defi­ni­tions of un­rig­gable and un­in­fluence­able meant in the IRL set­ting, I wouldn’t view them as de­sir­able.

Forecasting

Dou­ble Crux­ing the AI Foom de­bate (ag­ile­cave­man)

Cri­tiques (Align­ment)

The seven deadly sins of AI pre­dic­tions (Rod­ney Brooks): This is an older ar­ti­cle I was sent re­cently, that ar­gues against AI risk and the idea that we will have AGI soon. It gen­er­ally ar­gues that AGI pro­po­nents are mis­taken about cur­rent ca­pa­bil­ities of AI and how long it will take to make progress in AGI re­search.

My opinion: This ar­ti­cle is aimed at re­fut­ing the su­per­in­tel­li­gent perfectly-ra­tio­nal agent model of AGI, and so feels to me like it’s at­tack­ing a straw­man of the ar­gu­ment for AI risk, but it does seem to me that many peo­ple do have be­liefs similar to the ones he’s ar­gu­ing against. I par­tially agree with some of his crit­i­cisms and dis­agree with oth­ers, but over­all I think most of the ar­gu­ments are rea­son­able ones and worth know­ing about.

Mis­cel­la­neous (Align­ment)

Value Align­ment Map (FLI): This is a gi­gan­tic graph of many of the con­cepts in the AI risk space. Each con­cept has a de­scrip­tion and links to ex­ist­ing liter­a­ture, and by click­ing around in the map I found sev­eral in­ter­est­ing links I hadn’t seen be­fore.

My opinion: This map is so large that I can’t ac­tu­ally use it to get a birds-eye view of the en­tire space, but it seems quite use­ful for look­ing at a lo­cal re­gion and as a start­ing point to ex­plore one par­tic­u­lar as­pect more deeply.

AI strat­egy and policy

AI in the UK: ready, will­ing and able?

EU Mem­ber States sign up to co­op­er­ate on Ar­tifi­cial Intelligence

AI capabilities

Re­in­force­ment learning

A Study on Overfit­ting in Deep Re­in­force­ment Learn­ing (Chiyuan Zhang et al)

TDM: From Model-Free to Model-Based Deep Re­in­force­ment Learn­ing (Vitchyr Pong)

Deep learning

Rep­tile: A Scal­able Meta-Learn­ing Al­gorithm (Alex Ni­chol et al): Sum­ma­rized in the high­lights!

Phrase-Based & Neu­ral Un­su­per­vised Ma­chine Trans­la­tion (Guillaume Lam­ple et al)

Real­is­tic Eval­u­a­tion of Deep Semi-Su­per­vised Learn­ing Al­gorithms (Avi­tal Oliver, Au­gus­tus Odena, Colin Raf­fel et al)

News

Sum­mit on Ma­chine Learn­ing meets For­mal Meth­ods: This is a one-day sum­mit on July 13 that is part of the Fed­er­ated Logic Con­fer­ence. This seems like an un­usu­ally good venue to think about how to ap­ply for­mal meth­ods to AI sys­tems—in par­tic­u­lar I’m im­pressed by the list of speak­ers, which in­cludes a va­ri­ety of ex­perts in both fields.

No comments.