Alignment Newsletter #48

Find all Align­ment Newslet­ter re­sources here. In par­tic­u­lar, you can sign up, or look through this spread­sheet of all sum­maries that have ever been in the newslet­ter.


Quan­tiliz­ers: A Safer Alter­na­tive to Max­i­miz­ers for Limited Op­ti­miza­tion and When to use quan­tiliza­tion (Jes­sica Tay­lor and Ryan Carey): A key worry with AI al­ign­ment is that if we max­i­mize ex­pected util­ity for some util­ity func­tion cho­sen by hand, we will likely get un­in­tended side effects that score highly by the util­ity func­tion but are nev­er­the­less not what we in­tended. We might hope to lev­er­age hu­man feed­back to solve this: in par­tic­u­lar, an AI sys­tem that sim­ply mimics hu­man ac­tions would of­ten be de­sir­able. How­ever, mimicry can only achieve hu­man perfor­mance, and can­not im­prove upon it. The first link is a 2015 pa­per that in­tro­duces quan­tiliza­tion, which in­ter­po­lates be­tween these two ex­tremes to im­prove upon hu­man perfor­mance while bound­ing the po­ten­tial (ex­pected) loss from un­in­tended side effects.

In par­tic­u­lar, let’s sup­pose that hu­mans have some policy γ (i.e. prob­a­bil­ity dis­tri­bu­tion over ac­tions). We eval­u­ate util­ity or perfor­mance us­ing a util­ity func­tion U, but we do not as­sume it is well-speci­fied—U can be any func­tion, in­clud­ing one we would not want to max­i­mize. Our goal is to de­sign a policy π that gets higher ex­pected U than γ (re­flect­ing our hope that U mea­sures util­ity well) with­out do­ing too much worse than γ in the worst case when U was as badly de­signed as pos­si­ble. We’ll con­sider a one-shot case: π is used to se­lect an ac­tion once, and then the game is over.

The core idea be­hind quan­tiliza­tion is sim­ple: if our policy only does things that the hu­man might have done, any ex­pected loss it in­curs cor­re­sponds to some loss that the hu­man could in­cur. So, let’s take our hu­man policy γ, keep only the top q-frac­tion of γ (as eval­u­ated by U), and then sam­ple an ac­tion from there. This defines our policy π_q, also called a q-quan­tilizer. For ex­am­ple, sup­pose the hu­man would choose A with prob­a­bil­ity 0.25, B with prob­a­bil­ity 0.5, and C with prob­a­bil­ity 0.25, and U(A) > U(B) > U(C). Then a (1/​4)-quan­tilizer would choose A with cer­tainty, a (1/​2)-quan­tilizer would choose ran­domly be­tween A and B, and a (3/​8)-quan­tilizer would choose A twice as of­ten as B.

Note that even if π_q places all of its prob­a­bil­ity on the worst pos­si­ble ac­tions (be­cause U was badly de­signed), by con­struc­tion the hu­man had prob­a­bil­ity q of do­ing the same thing as π_q, and so the ex­pected cost of π_q can be at most (1/​q) times larger than the ex­pected cost of the hu­man policy γ. In fact, if we have no other knowl­edge, a q-quan­tilizer max­i­mizes ex­pected util­ity U sub­ject to the con­straint of never do­ing worse than (1/​q) times as bad as γ.

As soon as you move to the set­ting with mul­ti­ple ac­tions, if you choose each ac­tion us­ing quan­tiliza­tion, then your worst case bound is ex­po­nen­tial in the num­ber of ac­tions. If you as­sume the cost for each ac­tion is in­de­pen­dent, you re­cover the guaran­tees, but this is not a re­al­is­tic as­sump­tion (as the au­thors note). Long-term plans are very good or very bad be­cause all of the ac­tions build on each other to achieve some goal, so the costs are not go­ing to be in­de­pen­dent.

The sec­ond link is a re­cent post that fur­ther an­a­lyzes quan­tiliza­tion. It points out that the the­o­ret­i­cal anal­y­sis in the pa­per as­sumes that de­vi­at­ing from the hu­man policy can only lead to costs. If the de­vi­a­tion could also forgo benefits, then the the­o­rems no longer ap­ply. Per­haps the bot­tom (1-q) of ac­tions ac­tu­ally have strong pos­i­tive benefits, that we failed to model in U. Then by elimi­nat­ing those ac­tions, we may have lost ar­bi­trary amounts of ex­pected value. In these situ­a­tions, the only way to bound the ex­pected re­gret is by ex­act mimicry. It also points out that if you are aiming to si­mul­ta­neously do well both on U and the worst-case bound, then typ­i­cally imi­ta­tion will be bet­ter since adding any op­ti­miza­tion can dras­ti­cally weaken the worst-case bound and usu­ally will not make U bet­ter by the same amount. Quan­tiliza­tion makes sense when there is a “sweet-spot of ac­tions that are pretty com­mon but sub­stan­tially out­perform imi­ta­tion”.

Ro­hin’s opinion: The ex­po­nen­tial blowup in po­ten­tial loss with mul­ti­ple ac­tions would make this pro­hibitive, but of course you could in­stead view the full se­quence of ac­tions (i.e. tra­jec­tory) as a mega-ac­tion, and quan­tilize over this mega-ac­tion. In this case, a one-mil­lionth-quan­tilizer could choose from among the mil­lion best plans that a hu­man would make (as­sum­ing a well-speci­fied U), and any un­in­tended con­se­quences (that were in­ten­tion­ally cho­sen by the quan­tilizer) would have to be ones that a hu­man had a one-in-a-mil­lion chance of caus­ing to oc­cur, which quite plau­si­bly ex­cludes re­ally bad out­comes.

Phrased this way, quan­tiliza­tion feels like an am­plifi­ca­tion of a hu­man policy. Un­like the am­plifi­ca­tion in iter­ated am­plifi­ca­tion, it does not try to pre­serve al­ign­ment, it sim­ply tries to bound how far away from al­ign­ment the re­sult­ing policy can di­verge. As a re­sult, you can’t iter­ate quan­tiliza­tion to get ar­bi­trar­ily good ca­pa­bil­ities. You might hope that hu­mans could learn from pow­er­ful AI sys­tems, grow more ca­pa­ble them­selves (while re­main­ing as safe as they were be­fore), and then the next quan­tiliz­ers would be more pow­er­ful.

It’s worth not­ing that the the­o­rem in the pa­per shows that, to the ex­tent that you think quan­tiliza­tion is in­suffi­cient for AI al­ign­ment, you need to make some other as­sump­tion, or find some other source of in­for­ma­tion, in or­der to do bet­ter, since quan­tiliza­tion is op­ti­mal for its par­tic­u­lar setup. For ex­am­ple, you could try to as­sume that U is at least some­what rea­son­able and not patholog­i­cally bad; or you could as­sume an in­ter­ac­tive set­ting where the hu­man can no­tice and cor­rect for any is­sues with the U-max­i­miz­ing plan be­fore it is ex­e­cuted; or you could not have U at all and ex­ceed hu­man perfor­mance through some other tech­nique.

I’m not very wor­ried about the is­sue that quan­tiliza­tion could forgo benefits that the hu­man policy had. It seems that even if this hap­pens, we could no­tice this, turn off the quan­tilizer, and fix the util­ity func­tion U so that it no longer ig­nores those benefits. (We wouldn’t be able to pre­vent the quan­tilizer from for­go­ing benefits of our policy that we didn’t know about, but that seems okay to me.)

Tech­ni­cal AI alignment

Iter­ated amplification

Can HCH epistem­i­cally dom­i­nate Ra­manu­jan? (Alex Zhu): Iter­ated am­plifi­ca­tion rests on the hope that we can achieve ar­bi­trar­ily high ca­pa­bil­ities with (po­ten­tially very large) trees of ex­plicit ver­bal break­downs of prob­lems. This is of­ten for­mal­ized as a ques­tion about HCH (AN #34). This post con­sid­ers the ex­am­ple of Srini­vasa Ra­manu­jan, who is “fa­mously known for solv­ing math prob­lems with sud­den and in­ex­pli­ca­ble flashes of in­sight”. It is not clear how HCH would be able to repli­cate this sort of rea­son­ing.

Learn­ing hu­man intent

Un­su­per­vised Vi­suo­mo­tor Con­trol through Distri­bu­tional Plan­ning Net­works (Ti­anhe Yu et al)

Syn­tax vs se­man­tics: alarm bet­ter ex­am­ple than ther­mo­stat (Stu­art Arm­strong): This post gives a new ex­am­ple that more clearly illus­trates the points made in a pre­vi­ous post (AN #26).

Pr­ereq­ui­si­ties: Bridg­ing syn­tax and se­man­tics, empirically


Syn­the­siz­ing the preferred in­puts for neu­rons in neu­ral net­works via deep gen­er­a­tor net­works (Anh Nguyen et al)

Ad­ver­sar­ial examples

Quan­tify­ing Per­cep­tual Dis­tor­tion of Ad­ver­sar­ial Ex­am­ples (Matt Jor­dan et al) (sum­ma­rized by Dan H): This pa­per takes a step to­ward more gen­eral ad­ver­sar­ial threat mod­els by com­bin­ing ad­ver­sar­ial ad­di­tive per­tur­ba­tions small in an l_p sense with spa­tially trans­formed ad­ver­sar­ial ex­am­ples, among other other at­tacks. In this more gen­eral set­ting, they mea­sure the size of per­tur­ba­tions by com­put­ing the SSIM be­tween clean and per­turbed sam­ples, which has limi­ta­tions but is on the whole bet­ter than the l_2 dis­tance. This work shows, along with other con­cur­rent works, that per­tur­ba­tion ro­bust­ness un­der some threat mod­els does not yield ro­bust­ness un­der other threat mod­els. There­fore the view that l_p per­tur­ba­tion ro­bust­ness must be achieved be­fore con­sid­er­ing other threat mod­els is made more ques­tion­able. The pa­per also con­tributes a large code library for test­ing ad­ver­sar­ial per­tur­ba­tion ro­bust­ness.

On the Sen­si­tivity of Ad­ver­sar­ial Ro­bust­ness to In­put Data Distri­bu­tions (Gavin Weiguang Ding et al)


Pri­mates vs birds: Is one brain ar­chi­tec­ture bet­ter than the other? (Te­gan McCaslin): Progress in AI can be driven by both larger mod­els as well as ar­chi­tec­tural im­prove­ments (given suffi­cient data and com­pute), but which of these is more im­por­tant? One source of ev­i­dence comes from an­i­mals: differ­ent species that are closely re­lated will have similar neu­ral ar­chi­tec­tures, but po­ten­tially quite differ­ent brain sizes. This post com­pares in­tel­li­gence across birds and pri­mates: while pri­mates (and mam­mals more gen­er­ally) have a neo­cor­tex (of­ten used to ex­plain hu­man in­tel­li­gence), birds have a differ­ent, in­de­pen­dently-evolved type of cor­tex. Us­ing a sur­vey over non-ex­pert par­ti­ci­pants about how in­tel­li­gent differ­ent bird and pri­mate be­hav­ior is, it finds that there is not much differ­ence in in­tel­li­gence rat­ings be­tween birds and pri­mates, but that species with larger brains are rated as more in­tel­li­gent than those with smaller brains. This only sug­gests that there are at least two neu­ral ar­chi­tec­tures that work—it could still be a hard prob­lem to find them in the vast space of pos­si­ble ar­chi­tec­tures. Still, it is some ev­i­dence that at least in the case of evolu­tion, you get more in­tel­li­gence through more neu­rons, and ar­chi­tec­tural im­prove­ments are rel­a­tively less im­por­tant.

Ro­hin’s opinion: Upon read­ing the ex­per­i­men­tal setup I didn’t re­ally know which way the an­swer was go­ing to turn out, so I’m quite happy about now hav­ing an­other data point with which to un­der­stand learn­ing dy­nam­ics. Of course, it’s not clear how data about evolu­tion will gen­er­al­ize to AI sys­tems. For ex­am­ple, ar­chi­tec­tural im­prove­ments prob­a­bly re­quire some hard-to-find in­sight which make them hard to find via ran­dom search (imag­ine how hard it would be to in­vent CNNs by ran­domly try­ing stuff), while scal­ing up model size is easy, and so we might ex­pect AI re­searchers to be differ­en­tially bet­ter at find­ing ar­chi­tec­tural im­prove­ments rel­a­tive to scal­ing up model size (as com­pared to evolu­tion).

Read more: In­ves­ti­ga­tion into the re­la­tion­ship be­tween neu­ron count and in­tel­li­gence across differ­ing cor­ti­cal architectures

Mis­cel­la­neous (Align­ment)

Quan­tiliz­ers: A Safer Alter­na­tive to Max­i­miz­ers for Limited Op­ti­miza­tion and When to use quan­tiliza­tion (Jes­sica Tay­lor and Ryan Carey): Sum­ma­rized in the high­lights!

Hu­man-Cen­tered Ar­tifi­cial In­tel­li­gence and Ma­chine Learn­ing (Mark O. Riedl)

AI strat­egy and policy

Stable Agree­ments in Tur­bu­lent Times (Cul­len O’Keefe): On the one hand we would like ac­tors to be able to co­op­er­ate be­fore the de­vel­op­ment of AGI by en­ter­ing into bind­ing agree­ments, but on the other hand such agree­ments are of­ten un­palat­able and hard to write be­cause there is a lot of un­cer­tainty, in­de­ter­mi­nacy and un­fa­mil­iar­ity with the con­se­quences of de­vel­op­ing pow­er­ful AI sys­tems. This makes it very hard to be con­fi­dent that any given agree­ment is ac­tu­ally net pos­i­tive for a given ac­tor. The key point of this re­port is that we can strike a bal­ance be­tween these two ex­tremes by agree­ing pre-AGI to be bound by de­ci­sions that are made post-AGI with the benefit of in­creased knowl­edge. It ex­am­ines five tools for this pur­pose: op­tions, im­pos­si­bil­ity doc­trines, con­trac­tual stan­dards, rene­go­ti­a­tion, and third-party re­s­olu­tion.

Ad­vice to UN High-level Panel on Digi­tal Co­op­er­a­tion (Luke Kemp et al)

Other progress in AI

Re­in­force­ment learning

Neu­ral MMO (OpenAI) (sum­ma­rized by Richard): Neu­ral MMO is “a mas­sively mul­ti­a­gent game en­vi­ron­ment for re­in­force­ment learn­ing agents”. It was de­signed to be per­sis­tent (with con­cur­rent learn­ing and no en­vi­ron­ment re­sets), large-scale, effi­cient and ex­pand­able. Agents need to tra­verse an en­vi­ron­ment to ob­tain food and wa­ter in or­der to sur­vive for longer (the met­ric for which they are re­warded), and are also able to en­gage in com­bat with other agents. Agents trained within a larger pop­u­la­tion ex­plore more and con­sis­tently out­perform those trained in smaller pop­u­la­tions (when eval­u­ated to­gether). The au­thors note that mul­ti­a­gent train­ing is a cur­ricu­lum mag­nifier, not a cur­ricu­lum in it­self, and that the en­vi­ron­ment must fa­cil­i­tate adap­tive pres­sures by al­low­ing a suffi­cient range of in­ter­ac­tions.

Au­tocur­ricula and the Emer­gence of In­no­va­tion from So­cial In­ter­ac­tion: A Man­i­festo for Multi-Agent In­tel­li­gence Re­search (Joel Z. Leibo, Ed­ward Hughes, Marc Lanc­tot, Thore Grae­pel) (sum­ma­rized by Richard): The au­thors ar­gue that the best solu­tion to the prob­lem of task gen­er­a­tion is cre­at­ing multi-agent sys­tems where each agent must adapt to the oth­ers. Th­ese agents do so first by learn­ing how to im­ple­ment a high-level strat­egy, and then by adapt­ing it based on the strate­gies of oth­ers. (The au­thors use the term “adap­tive unit” rather than “agent” to em­pha­sise that change can oc­cur at many differ­ent hi­er­ar­chi­cal lev­els, and ei­ther by evolu­tion or learn­ing). This adap­ta­tion may be ex­oge­nous (driven by the need to re­spond to a chang­ing en­vi­ron­ment) or en­doge­nous (driven by a unit’s need to im­prove its own func­tion­al­ity). An ex­am­ple of the lat­ter is a so­ciety im­ple­ment­ing in­sti­tu­tions which en­force co­op­er­a­tion be­tween in­di­vi­d­u­als. Since in­di­vi­d­u­als will try to ex­ploit these in­sti­tu­tions, the pro­cess of grad­u­ally ro­bus­tify­ing them can be con­sid­ered an au­to­mat­i­cally-gen­er­ated cur­ricu­lum (aka au­tocur­riu­clum).

Richard’s opinion: My guess is that mul­ti­a­gent learn­ing will be­come very pop­u­lar fairly soon. In ad­di­tion to this pa­per and the Neu­ral MMO pa­per, it was also a key part of the AlphaS­tar train­ing pro­cess. The im­pli­ca­tions of this re­search di­rec­tion for safety are still un­clear, and it seems valuable to ex­plore them fur­ther. One which comes to mind: the sort of de­cep­tive be­havi­our re­quired for treach­er­ous turns seems more likely to emerge from mul­ti­a­gent train­ing than from sin­gle-agent train­ing.

Long-Range Robotic Nav­i­ga­tion via Au­to­mated Re­in­force­ment Learn­ing (Alek­san­dra Faust and An­thony Fran­cis): How can we get robots that suc­cess­fully nav­i­gate in the real world? One ap­proach is to use a high-level route plan­ner that uses a learned con­trol policy over very short dis­tances (10-15 me­ters). The con­trol policy is learned us­ing deep re­in­force­ment learn­ing, where the net­work ar­chi­tec­ture and re­ward shap­ing is also learned via neu­ral ar­chi­tec­ture search (or at least some­thing very similar). The simu­la­tions have enough noise that the learned con­trol policy trans­fers well to new en­vi­ron­ments. Given this policy as well as a floor­plan of the en­vi­ron­ment we want the robot to nav­i­gate in, we can build a graph of points on the floor­plan, where there is an edge be­tween two points if the robot can safely nav­i­gate be­tween the two points us­ing the learned con­trol­ler (which I think is checked in simu­la­tion). At ex­e­cu­tion time, we can find a path to the goal in this graph, and move along the edges us­ing the learned policy. They were able to build a graph for the four build­ings at the Google main cam­pus us­ing 300 work­ers over 4 days. They find that the robots are very ro­bust in the real world. See also Im­port AI.

Ro­hin’s opinion: This is a great ex­am­ple of a pat­tern that seems quite com­mon: once we au­to­mate tasks us­ing end-to-end train­ing that pre­vi­ously re­quired more struc­tured ap­proaches, new more com­plex tasks will arise that will use the end-to-end trained sys­tems as build­ing blocks in a big­ger struc­tured ap­proach. In this case, we can now train robots to nav­i­gate over short dis­tances us­ing end-to-end train­ing, and this has been used in a struc­tured ap­proach in­volv­ing graphs and way­points to cre­ate robots that can tra­verse larger dis­tances.

It’s also an ex­am­ple of what you can do when you have a ton of com­pute: for the learned con­trol­ler, they learned both the net­work ar­chi­tec­ture and the re­ward shap­ing. About the only thing that had to be ex­plic­ity speci­fied was the sparse true re­ward. (Although I’m sure in prac­tice it took a lot of effort to get ev­ery­thing to ac­tu­ally work.)

Com­pet­i­tive Ex­pe­rience Re­play (Hao Liu et al)


Q&A with Ja­son Ma­theny, Found­ing Direc­tor of CSET (Ja­son Ma­theny): The Cen­ter for Se­cu­rity and Emerg­ing Tech­nol­ogy has been an­nounced, with a $55 mil­lion grant from the Open Philan­thropy Pro­ject, and is hiring. While the cen­ter will work on emerg­ing tech­nolo­gies gen­er­ally, it will ini­tially fo­cus on AI, since de­mand for AI policy anal­y­sis has far out­paced sup­ply.

One area of fo­cus is the im­pli­ca­tions of AI on na­tional and in­ter­na­tional se­cu­rity. Cur­rent AI sys­tems are brit­tle and can eas­ily be fooled, im­ply­ing sev­eral safety and se­cu­rity challenges. What are these challenges, and how im­por­tant are they? How can we make sys­tems that are more ro­bust and miti­gate these prob­lems?

Another area is how to en­able effec­tive com­pe­ti­tion on AI in a global en­vi­ron­ment, while also co­op­er­at­ing on is­sues of safety, se­cu­rity and ethics? This will likely re­quire mea­sure­ment of in­vest­ment flows, pub­li­ca­tions, data and hard­ware across coun­tries, as well as man­age­ment of tal­ent and knowl­edge work­flows.

See also Im­port AI.

Ro­hin’s opinion: It’s great to see a cen­ter for AI policy that’s run by a per­son who has wanted to con­sume AI policy anal­y­sis in the past (Ja­son Ma­theny was pre­vi­ously the di­rec­tor of IARPA). It’s in­ter­est­ing to see the ar­eas he fo­cuses on in this Q&A—it’s not what I would have ex­pected given my very lit­tle knowl­edge of AI policy.