[AN #85]: The normative questions we should be asking for AI alignment, and a surprisingly good chatbot

Link post

View this email in your browser

Find all Align­ment Newslet­ter re­sources here. In par­tic­u­lar, you can sign up, or look through this spread­sheet of all sum­maries that have ever been in the newslet­ter. I’m always happy to hear feed­back; you can send it to me by re­ply­ing to this email.

Au­dio ver­sion here (may not be up yet).


Ar­tifi­cial In­tel­li­gence, Values and Align­ment (Ia­son Gabriel) (sum­ma­rized by Ro­hin): This pa­per from a Deep­Mind au­thor con­sid­ers what it would mean to al­ign an AI sys­tem. It first makes a dis­tinc­tion be­tween the tech­ni­cal and nor­ma­tive as­pects of the AI al­ign­ment prob­lem. Roughly, the nor­ma­tive as­pect asks, “what should our AI sys­tems do?”, while the tech­ni­cal as­pect asks, “given we know what our AI sys­tems should do, how do we get them to do it?”. The au­thor ar­gues that these two ques­tions are in­ter­re­lated and should not be solved sep­a­rately: for ex­am­ple, the cur­rent suc­cess of deep re­in­force­ment learn­ing in which we max­i­mize ex­pected re­ward sug­gests that it would be much eas­ier to al­ign AI to a util­i­tar­ian frame­work in which we max­i­mize ex­pected util­ity, as op­posed to a de­on­tolog­i­cal or Kan­tian frame­work.

The pa­per then ex­plores the nor­ma­tive as­pect, in both the sin­gle hu­man and mul­ti­ple hu­mans case. When there’s only one hu­man, we must grap­ple with the prob­lem of what to al­ign our AI sys­tem to. The pa­per con­sid­ers six pos­si­bil­ities: in­struc­tions, ex­pressed in­ten­tions, re­vealed prefer­ences, in­formed prefer­ences, in­ter­ests, and val­ues, but doesn’t come to a con­clu­sion about which is best. When there are mul­ti­ple hu­mans, we must also deal with the fact that differ­ent peo­ple dis­agree on val­ues. The pa­per an­a­lyzes three pos­si­bil­ities: al­ign­ing to a global no­tion of moral­ity (e.g. “ba­sic hu­man rights”), do­ing what peo­ple would pre­fer from be­hind a veil of ig­no­rance, and pur­su­ing val­ues that are de­ter­mined by a demo­cratic pro­cess (the do­main of so­cial choice the­ory).

See also Im­port AI #183

Ro­hin’s opinion: I’m ex­cited to see more big-pic­ture thought about AI al­ign­ment out of Deep­Mind. This newslet­ter (and I) tend to fo­cus a lot more on the tech­ni­cal al­ign­ment prob­lem than the nor­ma­tive one, partly be­cause there’s more work on it, but also partly be­cause I think it is the more ur­gent prob­lem (a con­tro­ver­sial po­si­tion).

Towards a Hu­man-like Open-Do­main Chat­bot (Daniel Adiwar­dana et al) (sum­ma­rized by Matthew): This pa­per pre­sents a chat­bot called Meena that reaches near hu­man-level perfor­mance for mea­sures of hu­man like­ness. The au­thors mined so­cial me­dia to find 341 GB of pub­lic do­main con­ver­sa­tions, and trained an evolved trans­former on those con­ver­sa­tions. To test its perfor­mance, they de­vised a met­ric they call Sen­si­bil­ity and Speci­fic­ity (SSA) which mea­sures how much sense the chat­bot’s re­sponses make in con­text, as well as whether they were spe­cific. SSA was tightly cor­re­lated with per­plex­ity and a sub­jec­tive mea­sure of hu­man like­ness, sug­gest­ing that op­ti­miz­ing for per­plex­ity will trans­late to greater con­ver­sa­tional abil­ity. Meena sub­stan­tially im­proved on the state of the art, in­clud­ing both hand-crafted bots like Mit­suku and the neu­ral model DialoGPT, though it still falls short of hu­man perfor­mance. You can read some con­ver­sa­tion tran­scrips here; many of the re­sponses from Meena are very hu­man-like.

See also Im­port AI #183

Matthew’s opinion: Pre­vi­ously I be­lieved that good chat­bots would be hard to build, since it is challeng­ing to find large datasets of high-qual­ity pub­lished con­ver­sa­tions. Given the very large dataset that the re­searchers were able to find, I no longer think this is a ma­jor bar­rier for chat­bots. It’s im­por­tant to note that this re­sult does not im­ply that a strong Tur­ing test will soon be passed: the au­thors them­selves note that SSA over­es­ti­mates the abil­ities of Meena rel­a­tive to hu­mans. Since hu­mans are of­ten vague in their con­ver­sa­tions, eval­u­at­ing hu­man con­ver­sa­tion with SSA yields a rel­a­tively low score. Fur­ther­more, a strong Tur­ing test would in­volve a judge ask­ing ques­tions de­signed to trip AI sys­tems, and we are not yet close to a sys­tem that could fool such judges.

Tech­ni­cal AI alignment

Mesa optimization

In­ner al­ign­ment re­quires mak­ing as­sump­tions about hu­man val­ues (Matthew Bar­nett) (sum­ma­rized by Ro­hin): Typ­i­cally, for in­ner al­ign­ment, we are con­sid­er­ing how to train an AI sys­tem that effec­tively pur­sues an outer ob­jec­tive func­tion, which we as­sume is already al­igned. Given this, we might think that the in­ner al­ign­ment prob­lem is in­de­pen­dent of hu­man val­ues: af­ter all, pre­sum­ably the outer ob­jec­tive func­tion already en­codes hu­man val­ues, and so if we are able to al­ign to an ar­bi­trary ob­jec­tive func­tion (some­thing that pre­sum­ably doesn’t re­quire hu­man val­ues), that would solve in­ner al­ign­ment.

This post ar­gues that this ar­gu­ment doesn’t work: in prac­tice, we only get data from the outer ob­jec­tive on the train­ing dis­tri­bu­tion, which isn’t enough to uniquely iden­tify the outer ob­jec­tive. So, solv­ing in­ner al­ign­ment re­quires our agent to “cor­rectly” gen­er­al­ize from the train­ing dis­tri­bu­tion to the test dis­tri­bu­tion. How­ever, the “cor­rect” gen­er­al­iza­tion de­pends on hu­man val­ues, sug­gest­ing that a solu­tion to in­ner al­ign­ment must de­pend on hu­man val­ues as well.

Ro­hin’s opinion: I cer­tainly agree that we need some in­for­ma­tion that leads to the “cor­rect” gen­er­al­iza­tion, though this could be some­thing like e.g. en­sur­ing that the agent is cor­rigible (AN #35). Whether this de­pends on hu­man “val­ues” de­pends on what you mean by “val­ues”.

Learn­ing hu­man intent

A Frame­work for Data-Driven Robotics (Serkan Cabi et al) (sum­ma­rized by Ni­cholas): This pa­per pre­sents a frame­work for us­ing a mix of task-ag­nos­tic data and task-spe­cific re­wards to learn new tasks. The pro­cess is as fol­lows:

1. A hu­man tele­op­er­ates the robot to provide a demon­stra­tion. This cir­cum­vents the ex­plo­ra­tion prob­lem, by di­rectly show­ing the robot the rele­vant states.

2. All of the robot’s sen­sory in­put is saved to Nev­erEnd­ing Stor­age (NES), which stores data from all tasks for fu­ture use.

3. Hu­mans an­no­tate a sub­set of the NES data via task-spe­cific re­ward sketch­ing, where hu­mans draw a curve show­ing progress to­wards the goal over time (see pa­per for more de­tails on their in­ter­face).

4. The la­bel­led data is used to train a re­ward model.

5. The agent is trained us­ing all the NES data, with the re­ward model pro­vid­ing re­wards.

6. At test-time, the robot con­tinues to save data to the NES.

They then use this ap­proach with a robotic arm on a few ob­ject ma­nipu­la­tion tasks, such as stack­ing the green ob­ject on top of the red one. They find that on these tasks, they can an­no­tate re­wards at hun­dreds of frames per minute.

Ni­cholas’s opinion: I’m happy to see re­ward mod­el­ing be­ing used to achieve new ca­pa­bil­ities re­sults, pri­mar­ily be­cause it may lead to more fo­cus from the broader ML com­mu­nity on a prob­lem that seems quite im­por­tant for safety. Their re­ward sketch­ing pro­cess is quite effi­cient and hav­ing more re­ward data from hu­mans should en­able a more faith­ful model, at least on tasks where hu­mans are able to an­no­tate ac­cu­rately.

Mis­cel­la­neous (Align­ment)

Does Bayes Beat Good­hart? (Abram Dem­ski) (sum­ma­rized by Flo): It has been claimed (AN #22) that Good­hart’s law might not be a prob­lem for ex­pected util­ity max­i­miza­tion, as long as we cor­rectly ac­count for our un­cer­tainty about the cor­rect util­ity func­tion.

This post ar­gues that Bayesian ap­proaches are in­suffi­cient to get around Good­hart. One prob­lem is that with in­suffi­cient over­lap be­tween pos­si­ble util­ity func­tions, some util­ity func­tions might es­sen­tially be ig­nored when op­ti­miz­ing the ex­pec­ta­tion, even if our prior as­signs pos­i­tive prob­a­bil­ity to them. How­ever, in re­al­ity, there is likely con­sid­er­able over­lap be­tween the util­ity func­tions in our prior, as they are se­lected to fit our in­tu­itions.

More severely, bad pri­ors can lead to sys­tem­atic bi­ases in a bayesian’s ex­pec­ta­tions, es­pe­cially given em­bed­ded­ness. As an ex­treme ex­am­ple, the prior might as­sign zero prob­a­bil­ity to the cor­rect util­ity func­tion. Cal­ibrated in­stead of Bayesian learn­ing can help with this, but only for re­gres­sional Good­hart (Re­con #5). Ad­ver­sar­ial Good­hart, where an­other agent tries to ex­ploit the differ­ence be­tween your util­ity and your proxy seems to also re­quire ran­dom­iza­tion like quan­tiliza­tion (AN #48).

Flo’s opinion: The de­gree of over­lap be­tween util­ity func­tions seems to be pretty cru­cial (also see here (AN #82)). It does seem plau­si­ble for the Bayesian ap­proach to work well with­out the cor­rect util­ity in the prior if there was a lot of over­lap be­tween the util­ities in the prior and the true util­ity. How­ever, I am some­what scep­ti­cal of our abil­ity to get re­li­able es­ti­mates for that over­lap.

Other progress in AI

Deep learning

Deep Learn­ing for Sym­bolic Math­e­mat­ics (Guillaume Lam­ple et al) (sum­ma­rized by Matthew): This pa­per demon­strates the abil­ity of se­quence-to-se­quence mod­els to out­perform com­puter alge­bra sys­tems (CAS) at the tasks of sym­bolic in­te­gra­tion and solv­ing or­di­nary differ­en­tial equa­tions. Since find­ing the deriva­tive of a func­tion is usu­ally eas­ier than in­te­gra­tion, the au­thors gen­er­ated a large train­ing set by gen­er­at­ing ran­dom math­e­mat­i­cal ex­pres­sions, and then us­ing these ex­pres­sions as the la­bels for their deriva­tives. The math­e­mat­i­cal ex­pres­sions were for­mu­lated as syn­tax trees, and mapped to se­quences by writ­ing them in Pol­ish no­ta­tion. Th­ese se­quences were, in turn, used to train a trans­former model. While their model out­performed top CAS on the train­ing data set, and could com­pute an­swers much more quickly than the CAS could, tests of gen­er­al­iza­tion were mixed: im­por­tantly, the model did not gen­er­al­ize ex­tremely well to datasets that were gen­er­ated us­ing differ­ent tech­niques than the train­ing dataset.

Matthew’s opinion: At first this pa­per ap­peared more am­bi­tious than Sax­ton et al. (2019), but it ended up with more pos­i­tive re­sults, even though the pa­pers used the same tech­niques. There­fore, my im­pres­sion is not that we re­cently made rapid progress on in­cor­po­rat­ing math­e­mat­i­cal rea­son­ing into neu­ral net­works; rather, I now think that the tasks of in­te­gra­tion and solv­ing differ­en­tial equa­tions are sim­ply well-suited for neu­ral net­works.

Un­su­per­vised learning

Gen­er­a­tive Teach­ing Net­works: Ac­cel­er­at­ing Neu­ral Ar­chi­tec­ture Search by Learn­ing to Gen­er­ate Syn­thetic Train­ing Data (Felipe Pet­roski Such et al) (sum­ma­rized by Sud­han­shu): The Gen­er­a­tive Teach­ing Net­works (GTN) pa­per breaks new ground by train­ing gen­er­a­tors that pro­duce syn­thetic data that can en­able learner neu­ral net­works to learn faster than when train­ing on real data. The pro­cess is as fol­lows: The gen­er­a­tor pro­duces syn­thetic train­ing data by trans­form­ing some sam­pled noise vec­tor and la­bel; a newly-ini­tial­ized learner is trained on this syn­thetic data and eval­u­ated on real data; the er­ror sig­nal from this eval­u­a­tion is back­prop­a­gated to the gen­er­a­tor via meta-gra­di­ents, to en­able it to pro­duce syn­thetic sam­ples that will train the learner net­works bet­ter. They also demon­strate that their cur­ricu­lum learn­ing var­i­ant, where the in­put vec­tors and their or­der are learned along with gen­er­a­tor pa­ram­e­ters, is es­pe­cially pow­er­ful at teach­ing learn­ers with few sam­ples and few steps of gra­di­ent de­scent.

They ap­ply their sys­tem to neu­ral ar­chi­tec­ture search, and show an em­piri­cal cor­re­la­tion be­tween perfor­mance of a learner on syn­thetic data and its even­tual perfor­mance when trained on real data. In this man­ner, they make the ar­gu­ment that data from a trained GTN can be used to cheaply as­sess the like­li­hood of a given net­work suc­ceed­ing to learn on the real task, and hence GTN data can tremen­dously speed up ar­chi­tec­ture search.

Sud­han­shu’s opinion: I re­ally like this pa­per; I think it shines a light in an in­ter­est­ing new di­rec­tion, and I look for­ward to see­ing fu­ture work that builds on this in the­o­ret­i­cal, mechanis­tic, and ap­plied man­ners. On the other hand, I felt they did gloss over how ex­actly they do cur­ricu­lum learn­ing, and their re­in­force­ment learn­ing ex­per­i­ment was a lit­tle un­clear to me.

I think the im­pli­ca­tions of this work are enor­mous. In a fu­ture where we might be limited by the ma­tu­rity of available simu­la­tion plat­forms or in­un­dated by del­uges of data with lit­tle marginal in­for­ma­tion, this ap­proach can cir­cum­vent such prob­lems for the se­lec­tion and (pre)train­ing of suit­able stu­dent net­works.

Read more: Blog post


Ju­nior Re­search As­sis­tant and Pro­ject Man­ager role at GCRI (sum­ma­rized by Ro­hin): This job is available im­me­di­ately, and could be full-time or part-time. GCRI also cur­rently has a call for ad­visees and col­lab­o­ra­tors.

Re­search As­so­ci­ate and Se­nior Re­search As­so­ci­ate at CSER (sum­ma­rized by Ro­hin): Ap­pli­ca­tion dead­line is Feb 16.

Copy­right © 2020 Ro­hin Shah, All rights re­served.

Want to change how you re­ceive these emails?

You can up­date your prefer­ences or un­sub­scribe from this list.