Align­ment News­let­ter #31


In­tro­du­cing the AI Align­ment Forum (FAQ) (habryka): The Align­ment Forum has of­fi­cially launched! It aims to be the single on­line hub for re­search­ers to have con­ver­sa­tions about all the ideas in the field, while also help­ing new re­search­ers get up to speed. While post­ing is re­stric­ted to mem­bers, all con­tent is cross-pos­ted to LessWrong, where any­one can en­gage with it. In ad­di­tion, for the next few weeks there will be a daily post from one of three new se­quences on em­bed­ded agency, it­er­ated amp­li­fic­a­tion, and value learn­ing.

Ro­hin’s opin­ion: I’m ex­cited for this forum, and will be col­lat­ing the value learn­ing se­quence for its launch. Since these se­quences are meant to teach some of the key ideas in AI align­ment, I would prob­ably end up high­light­ing every single post. In­stead of that, I’m go­ing to cre­ate new cat­egor­ies for each se­quence and sum­mar­ize them each week within the cat­egory, but you should treat them as if I had high­lighted them.

Rein­force­ment Learn­ing with Pre­dic­tion-Based Re­wards (Yuri Burda and Harri Ed­wards) (sum­mar­ized by Richard): Re­search­ers at OpenAI have beaten av­er­age hu­man per­form­ance on Mon­te­zuma’s Revenge us­ing a pre­dic­tion-based curi­os­ity tech­nique called Ran­dom Net­work Distil­la­tion. A net­work with fixed ran­dom weights eval­u­ates each state; an­other net­work with the same ar­chi­tec­ture is trained to pre­dict the ran­dom net­work’s out­put, given its in­put. The agent re­ceives an ad­di­tional re­ward pro­por­tional to the pre­dictor’s er­ror on its cur­rent state. The idea be­hind the tech­nique is that the pre­dictor’s er­ror will be higher on states dif­fer­ent from those it’s been trained on, and so the agent will be re­war­ded for ex­plor­ing them.

This pa­per fol­lows from their study on curi­os­ity (AN #20) in which a pre­dictor was trained to pre­dict the next state dir­ectly, and the agent was re­war­ded when its er­ror was high. However, this led to high re­ward on states that were un­pre­dict­able due to model lim­it­a­tions or stochasti­city (e.g. the noisy TV prob­lem). By con­trast, Ran­dom Net­work Distil­la­tion only re­quires the pre­dic­tion of a de­term­in­istic func­tion which is def­in­itely within the class of func­tions rep­res­ent­able by the pre­dictor (since it has the same ar­chi­tec­ture as the ran­dom net­work).

Richard’s opin­ion: This is an im­port­ant step for­ward for curi­os­ity-driven agents. As the au­thors note in the pa­per, RND has the ad­di­tional ad­vant­ages of be­ing simple to im­ple­ment and flex­ible.

Tech­nical AI alignment

Embed­ded agency sequence

Embed­ded Agents (Abram Dem­ski and Scott Gar­rabrant): This post in­tro­duces em­bed­ded agency, which refers to the no­tion of an “agent” that is more real­istic than the ver­sion con­sidered in main­stream AI, which is best form­al­ized by AIXI. An em­bed­ded agent is one that is ac­tu­ally a part of the en­vir­on­ment it is act­ing in, as op­posed to our cur­rent AI agents which model the en­vir­on­ment as ex­ternal to them. The prob­lems around em­bed­ded agency fall into four main clusters, which fu­ture posts will talk about.

Ro­hin’s opin­ion: This post is a great sum­mary of the se­quence to come, and is in­tu­it­ive and easy to un­der­stand. I strongly re­com­mend read­ing the full post—I haven’t sum­mar­ized it much be­cause it already is a good sum­mary.

De­cision The­ory (Abram Dem­ski and Scott Gar­rabrant): The ma­jor is­sue with port­ing de­cision the­ory to the em­bed­ded agency sec­tion is that there is no longer a clear, well-defined bound­ary between ac­tions and out­comes, such that we can say “if I take this ac­tion, then this out­come oc­curs”. In an em­bed­ded set­ting, the agent is just an­other part of the en­vir­on­ment, and so if the agent is reas­on­ing about the en­vir­on­ment, it can also reason about it­self, and its reas­on­ing can tell it some­thing about what its ac­tions will be. But if you know what ac­tion you are go­ing to take, how do you prop­erly think about the coun­ter­fac­tual “what if I had taken this other ac­tion”?

A form­al­iz­a­tion in lo­gic, where coun­ter­fac­tu­als are rep­res­en­ted by lo­gical im­plic­a­tion, doesn’t work. If you know what your ac­tion is go­ing to be, then the premise of the coun­ter­fac­tual (that you take some other ac­tion) is false, and you can con­clude any­thing. The post gives a con­crete ex­ample of a reas­on­able-look­ing agent which ends up choos­ing to take $5 when offered a choice between $5 and $10 be­cause it can prove that “if I took $10, then I would get $0” (which is in fact true, since it took $5, and not $10!) A form­al­iz­a­tion in prob­ab­il­ity the­ory doesn’t work, be­cause if you con­di­tion on an al­tern­at­ive ac­tion that you know you won’t take, you are con­di­tion­ing on a prob­ab­il­ity zero event. If you say that there is al­ways some un­cer­tainty in which ac­tion you take, or you force the agent to al­ways ex­plore with some small prob­ab­il­ity, then your agent is go­ing to reason about al­tern­at­ive ac­tions un­der the as­sump­tion that there was some hard­ware fail­ure, or that it was forced to ex­plore—this seems like the wrong way to reason about al­tern­at­ives.

Changing tack a bit, how would we think about “What if 2+2=3?” This seems like a pretty hard coun­ter­fac­tual for us to eval­u­ate—it’s not clear what it means. There may just be no “cor­rect” coun­ter­fac­tu­als—but in this case we still need to fig­ure out how in­tel­li­gent agents like hu­mans suc­cess­fully con­sider al­tern­at­ive ac­tions that they are not go­ing to take, in or­der to make good de­cisions. One ap­proach is Up­date­less De­cision The­ory (UDT), which takes the ac­tion your earlier self would have wanted to com­mit to, which comes closer to view­ing the prob­lem from the out­side. While it neatly re­solves many of the prob­lems in de­cision the­ory, in­clud­ing coun­ter­fac­tual mug­ging (de­scribed in the post), it as­sumes that your earlier self can fore­see all out­comes, which can’t hap­pen in em­bed­ded agents be­cause the en­vir­on­ment is big­ger than the agent and any world model can only be ap­prox­im­ate (the sub­ject of the next post).

Ro­hin’s opin­ion: Warn­ing: Ram­blings about top­ics I haven’t thought about much.

I’m cer­tainly con­fused about how hu­mans ac­tu­ally make de­cisions—we do seem to be able to con­sider coun­ter­fac­tu­als in some reas­on­able way, but it does seem like these are re­l­at­ively fuzzy (we can’t do the coun­ter­fac­tual “what if 2+2=3”, we can do the coun­ter­fac­tual “what if I took the $10″, and we dis­agree on how to do the coun­ter­fac­tual “what would hap­pen if we leg­al­ize drugs” (eg. do we as­sume that pub­lic opin­ion has changed or not?). This makes me feel pess­im­istic about the goal of hav­ing a “cor­rect” coun­ter­fac­tual—it seems likely that hu­mans some­how build causal mod­els of some as­pects of the world (which do ad­mit good coun­ter­fac­tu­als), es­pe­cially of the ac­tions we can take, and not of oth­ers (like math), and dis­agree­ments on “cor­rect” coun­ter­fac­tu­als amount to dis­agree­ments on causal mod­els. Of course, this just pushes the ques­tion down to how we build causal mod­els—maybe we have an in­duct­ive bias that pushes us to­wards simple causal mod­els, and the world just hap­pens to be the kind where the data you ob­serve con­strains your mod­els sig­ni­fic­antly, such that every­one ends up in­fer­ring sim­ilar causal mod­els.

However, if we do build some­thing like this, it seems hard to cor­rectly solve most de­cision the­ory prob­lems that they con­sider, such as New­comb­like prob­lems, at least if we use the in­tu­it­ive no­tion of caus­al­ity. Maybe this is okay, maybe not, I’m not sure. It def­in­itely doesn’t feel like this is resolv­ing my con­fu­sion about how to make good de­cisions in gen­eral, though I could ima­gine that it could re­solve my con­fu­sion about how to make good de­cisions in our ac­tual uni­verse (where caus­al­ity seems im­port­ant and “easy” to in­fer).

Embed­ded World-Models (Abram Dem­ski and Scott Gar­rabrant): In or­der to get op­timal be­ha­vior on en­vir­on­ments, you need to be able to model the en­vir­on­ment in full de­tail, which an em­bed­ded agent can­not do. For ex­ample, AIXI is in­com­put­able and gets op­timal be­ha­vior on com­put­able en­vir­on­ments. If you use AIXI in an in­com­put­able en­vir­on­ment, it gets bounded loss on pre­dict­ive ac­cur­acy com­pared to any com­put­able pre­dictor, but there are no res­ults on ab­so­lute loss on pre­dict­ive ac­cur­acy, or on the op­tim­al­ity of ac­tions it chooses. In gen­eral, if the en­vir­on­ment is not in the space of hy­po­theses you can con­sider, that is your en­vir­on­ment hy­po­thesis space is mis­spe­cified, then many bad is­sues can arise (as of­ten hap­pens with mis­spe­cific­a­tion). This is called the grain-of-truth prob­lem, so named be­cause you have to deal with the fact that your prior does not even have a grain of truth (the true en­vir­on­ment hy­po­thesis).

One ap­proach could be to learn a small yet well-spe­cified model of the en­vir­on­ment, such as the laws of phys­ics, but not be able to com­pute all of the con­sequences of that model. This gives rise to the prob­lem of lo­gical un­cer­tainty, where you would like to have be­liefs about facts that can be de­duced or re­futed from facts you already know, but you lack the abil­ity to do this. This re­quires a uni­fic­a­tion of lo­gic and prob­ab­il­ity, which is sur­pris­ingly hard.

Another con­sequence is that our agents will need to have high-level world mod­els—they need to be able to talk about things like chairs and tables as atoms, rather than think­ing of everything as a quantum wave­func­tion. They will also have to deal with the fact that the high-level mod­els will of­ten con­flict with mod­els at lower levels, and that mod­els at any level could shift and change without any change to mod­els at other levels. An on­to­lo­gical crisis oc­curs when there is a change in the level at which our val­ues are defined, such that it is not clear how to ex­tra­pol­ate our val­ues to the new model. An ana­logy would be if our view of the world changed such that “hap­pi­ness” no longer seemed like a co­her­ent concept.

As al­ways, we also have prob­lems with self-ref­er­ence—nat­ur­al­ized in­duc­tion is the prob­lem of learn­ing a world model that in­cludes the agent, and an­thropic reas­on­ing re­quires you to fig­ure out how many cop­ies of your­self ex­ist in the world.

Ro­hin’s opin­ion: Warn­ing: Ram­blings about top­ics I haven’t thought about much.

The high-level and multi-level model prob­lems sound sim­ilar to the prob­lems that could arise with hier­arch­ical re­in­force­ment learn­ing or hier­arch­ical rep­res­ent­a­tion learn­ing, though the em­phasis here is on the in­con­sist­en­cies between dif­fer­ent levels rather than how to learn the model in the first place.

The grain of truth prob­lem is one of the prob­lems I am most con­fused about—in ma­chine learn­ing, model mis­spe­cific­a­tion can lead to very bad res­ults, so it is not clear how to deal with this even ap­prox­im­ately in prac­tice. (Whereas with de­cision the­ory, “ap­prox­im­ate in-prac­tice solu­tions” in­clude learn­ing causal mod­els on which you can con­struct coun­ter­fac­tu­als, or learn­ing from ex­per­i­ence what sort of de­cision­mak­ing al­gorithm tends to work well, and these solu­tions do not ob­vi­ously fail as you scale up.) If you learn enough to rule out all of your hy­po­theses, as could hap­pen with the grain of truth prob­lem, what do you do then? If you’re work­ing in a Bayesian frame­work, you end up go­ing with the hy­po­thesis you’ve dis­proven the least, which is prob­ably not go­ing to get you good res­ults. If you’re work­ing in lo­gic, you get an er­ror. I guess learn­ing a model of the en­vir­on­ment in model-based RL doesn’t ob­vi­ously fail if you scale up.

Ro­bust Deleg­a­tion (Abram Dem­ski and Scott Gar­rabrant): Pre­sum­ably, we will want to build AI sys­tems that be­come more cap­able as time goes on, whether simply by learn­ing more or by con­struct­ing a more in­tel­li­gent suc­cessor agent (i.e. self-im­prove­ment). In both cases, the agent would like to en­sure that its fu­ture self con­tin­ues to ap­ply its in­tel­li­gence in pur­suit of the same goals, a prob­lem known as Vin­gean re­flec­tion. The main is­sue is that the fu­ture agent is “big­ger” (more cap­able) than the cur­rent agent, and so the smal­ler agent can­not pre­dict it. In ad­di­tion, from the fu­ture agent’s per­spect­ive, the cur­rent agent may be ir­ra­tional, may not know what it wants, or could be made to look like it wants just about any­thing.

When con­struct­ing a suc­cessor agent, you face the value load­ing prob­lem, where you need to spe­cify what you want the suc­cessor agent to do, and you need to get it right be­cause op­tim­iz­a­tion amp­li­fies (AN #13) mis­takes, in par­tic­u­lar via Good­hart’s Law. There’s a dis­cus­sion of the types of Good­hart’s Law (also de­scribed in Good­hart Tax­onomy). Another is­sue that arises in this set­ting is that the suc­cessor agent could take over the rep­res­ent­a­tion of the re­ward func­tion and make it al­ways out­put the max­imal value, a phe­nomenon called “wire­head­ing”, though this can be avoided if the agent’s plan to do this is eval­u­ated by the cur­rent util­ity func­tion.

One hope is to cre­ate the suc­cessor agent from the ori­ginal agent through in­tel­li­gence amp­li­fic­a­tion, along the lines of it­er­ated amp­li­fic­a­tion. However, this re­quires the cur­rent small agent to be able to de­com­pose ar­bit­rary prob­lems, and to en­sure that its pro­posed de­com­pos­i­tion doesn’t give rise to ma­lign sub­com­pu­ta­tions, a prob­lem to be de­scribed in the next post on sub­sys­tem align­ment.

Ro­hin’s opin­ion: This is a lot closer to the prob­lem I think about fre­quently (since I fo­cus on the prin­cipal-agent prob­lem between a hu­man and an AI) so I have a lot of thoughts about this, but they’d take a while to un­tangle and ex­plain. Hope­fully, a lot of these in­tu­itions will be writ­ten up in the second part of the value learn­ing se­quence.

Value learn­ing sequence

Pre­face to the Sequence on Value Learn­ing (Ro­hin Shah): This is a pre­face, read it if you’re go­ing to read the full posts, but not if you’re only go­ing to read these sum­mar­ies.

What is am­bi­tious value learn­ing? (Ro­hin Shah): The spe­cific­a­tion prob­lem is the prob­lem of de­fin­ing the be­ha­vior we want out of an AI sys­tem. If we use the com­mon model of a su­per­in­tel­li­gent AI max­im­iz­ing some ex­pli­cit util­ity func­tion, this re­duces to the prob­lem of de­fin­ing a util­ity func­tion whose op­timum is achieved by be­ha­vior that we want. We know that our util­ity func­tion is too com­plex to write down (if it even ex­ists), but per­haps we can learn it from data about hu­man be­ha­vior? This is the idea be­hind am­bi­tious value learn­ing—to learn a util­ity func­tion from hu­man be­ha­vior that can be safely max­im­ized. Note that since we are tar­get­ing the spe­cific­a­tion prob­lem, we only want to define the be­ha­vior, so we can as­sume in­fin­ite com­pute, in­fin­ite data, per­fect max­im­iz­a­tion, etc.

The easy goal in­fer­ence prob­lem is still hard (Paul Chris­ti­ano): One con­crete way of think­ing about am­bi­tious value learn­ing is to think about the case where we have the full hu­man policy, that is, we know how a par­tic­u­lar hu­man re­sponds to all pos­sible in­puts (life ex­per­i­ences, memor­ies, etc). In this case, it is still hard to in­fer a util­ity func­tion from the policy. If we in­fer a util­ity func­tion as­sum­ing that hu­mans are op­timal, then an AI sys­tem that max­im­izes this util­ity func­tion will re­cover hu­man be­ha­vior, but will not sur­pass it. In or­der to sur­pass hu­man per­form­ance, we need to ac­cur­ately model the mis­takes a hu­man makes, and cor­rect for them when in­fer­ring a util­ity func­tion. It’s not clear how to get this—the usual ap­proach in ma­chine learn­ing is to choose more ac­cur­ate mod­els, but in this case even the most ac­cur­ate model only gets us to hu­man im­it­a­tion.

Hu­mans can be as­signed any val­ues what­so­ever… (Stu­art Arm­strong): This post form­al­izes the think­ing in the pre­vi­ous post. Since we need to model hu­man ir­ra­tion­al­ity in or­der to sur­pass hu­man per­form­ance, we can form­al­ize the hu­man’s plan­ning al­gorithm p, which takes as in­put a re­ward or util­ity func­tion R, and pro­duces a policy pi = p(R). Within this form­al­ism, we would like to in­fer p and R for a hu­man sim­ul­tan­eously, and then op­tim­ize R alone. However, the only con­straint we have is that p(R) = pi, and there are many pairs of p and R that work be­sides the “reas­on­able” p and R that we are try­ing to in­fer. For ex­ample, p could be ex­pec­ted util­ity max­im­iz­a­tion and R could place re­ward 1 on the (his­tory, ac­tion) pairs in the policy and re­ward 0 on any pair not in the policy. And for every pair, we can define a new pair (-p, -R) which neg­ates the re­ward, with (-p)(R) defined to be p(-R), that is the plan­ner neg­ates the re­ward (re­turn­ing it to its ori­ginal form) be­fore us­ing it. We could also have R = 0 and p be the con­stant func­tion that al­ways out­puts the policy pi. All of these pairs re­pro­duce the hu­man policy pi, but if you throw away the plan­ner p and op­tim­ize the re­ward R alone, you will get very dif­fer­ent res­ults. You might think that you could avoid this im­possib­il­ity res­ult by us­ing a sim­pli­city prior, but at least a Kol­mogorov sim­pli­city prior barely helps.

Tech­nical agen­das and prioritization

Dis­cus­sion on the ma­chine learn­ing ap­proach to AI safety (Vika) (sum­mar­ized by Richard): This blog post (based on a talk at EA Global Lon­don) dis­cusses whether cur­rent work on the ma­chine learn­ing ap­proach to AI safety will re­main rel­ev­ant in the face of po­ten­tial paradig­matic changes in ML sys­tems. Vika and Jan rate how much they rely on each as­sump­tions in a list drawn from [this blog post by Jon Gau­th­ier] (http://​​​​2018/​​con­cep­tual-is­sues-ai-safety-paradig­matic-gap/​​) (AN #13), and how likely each as­sump­tions is to hold up over time. They also eval­u­ate ar­gu­ments for hu­man-in-the-loop ap­proaches versus prob­lem-spe­cific ap­proaches.

Richard’s opin­ion: This post con­cisely con­veys a num­ber of Vika and Jan’s views, al­beit without ex­plan­a­tions for most of them. I’d en­cour­age other safety re­search­ers to do the same ex­er­cise, with a view to flesh­ing out the cruxes be­hind whatever dis­agree­ments come up.

Learn­ing hu­man intent

BabyAI: First Steps Towards Groun­ded Lan­guage Learn­ing With a Hu­man In the Loop (Maxime Cheva­lier-Bois­vert, Dzmitry Bah­danau et al): See Im­port AI.

One-Shot Hi­er­arch­ical Imit­a­tion Learn­ing of Com­pound Visuo­mo­tor Tasks (Tianhe Yu et al)

Ef­fi­ciently Com­bin­ing Hu­man De­mon­stra­tions and In­ter­ven­tions for Safe Train­ing of Autonom­ous Sys­tems in Real-Time (Vini­cius G. Goecks et al)

In­verse re­in­force­ment learn­ing for video games (Aaron Tucker et al)

Hand­ling groups of agents

In­trinsic So­cial Mo­tiv­a­tion via Causal In­flu­ence in Multi-Agent RL (Nata­sha Jaques et al)


On the Ef­fect­ive­ness of In­ter­val Bound Propaga­tion for Train­ing Veri­fi­ably Ro­bust Models (Sven Gowal, Krish­namurthy Dvi­jotham, Robert Stan­forth et al)

Field building

The fast­est way into a high-im­pact role as a ma­chine learn­ing en­gin­eer, ac­cord­ing to Cath­er­ine Ols­son & Daniel Zie­g­ler (Cath­er­ine Ols­son, Daniel Zie­g­ler, and Rob Wib­lin) (sum­mar­ized by Richard): Cath­er­ine and Daniel both star­ted PhDs, but left to work on AI safety (they’re cur­rently at Google Brain and OpenAI re­spect­ively). They note that AI safety teams need re­search en­gin­eers to do im­ple­ment­a­tion work, and that tal­en­ted pro­gram­mers can pick up the skills re­quired within a few months, without need­ing to do a PhD. The dis­tinc­tion between re­search en­gin­eers and re­search sci­ent­ists is fairly fluid—while re­search en­gin­eers usu­ally work un­der the dir­ec­tion of a re­search sci­ent­ist, they of­ten do sim­ilar things.

Their ad­vice on de­vel­op­ing the skills needed to get into good re­search roles is not to start with a broad the­or­et­ical fo­cus, but rather to dive straight into the de­tails. Read and re­im­ple­ment im­port­ant pa­pers, to de­velop tech­nical ML ex­pert­ise. Find spe­cific prob­lems rel­ev­ant to AI safety that you’re par­tic­u­larly in­ter­ested in, fig­ure out what skills they re­quire, and fo­cus on those. They also ar­gue that even if you want to even­tu­ally do a PhD, get­ting prac­tical ex­per­i­ence first is very use­ful, both tech­nic­ally and mo­tiv­a­tion­ally. While they’re glad not to have fin­ished their PhDs, do­ing one can provide im­port­ant ment­or­ship.

This is a long pod­cast and there’s also much more dis­cus­sion of ob­ject-level AI safety ideas, al­beit mostly at an in­tro­duct­ory level.

Richard’s opin­ion: Anyone who wants to get into AI safety (and isn’t already an AI re­searcher) should listen to this pod­cast—there’s a lot of use­ful in­form­a­tion in it and this ca­reer trans­ition guide. I agree that hav­ing more re­search en­gin­eers is very valu­able, and that it’s a re­l­at­ively easy trans­ition for people with CS back­grounds to make. (I may be a little biased on this point, though, since it’s also the path I’m cur­rently tak­ing.)

I think the is­sue of PhDs and ment­or­ship is an im­port­ant and com­plic­ated one. The field of AI safety is cur­rently bot­tle­necked to a sig­ni­fic­ant ex­tent by the avail­ab­il­ity of ment­or­ship, and so even a ML PhD un­re­lated to safety can still be very valu­able if it teaches you how to do good in­de­pend­ent re­search and su­per­vise oth­ers, without re­quir­ing the time of cur­rent safety re­search­ers. Also note that the trade-offs in­volve vary quite a bit. In par­tic­u­lar, European PhDs can be sig­ni­fic­antly shorter than US ones; and the one-year Masters de­grees avail­able in the UK are a quick and easy way to trans­ition into re­search en­gin­eer­ing roles.

Read more: Con­crete next steps for trans­ition­ing from CS or soft­ware en­gin­eer­ing into ML en­gin­eer­ing for AI safety and alignment

Other pro­gress in AI


Rein­force­ment Learn­ing with Pre­dic­tion-Based Re­wards (Yuri Burda and Harri Ed­wards): Sum­mar­ized in the high­lights!

Rein­force­ment learning

Assess­ing Gen­er­al­iz­a­tion in Deep Rein­force­ment Learn­ing (Charles Packer, Katelyn Gao et al) (sum­mar­ized by Richard): This pa­per aims to cre­ate a bench­mark for meas­ur­ing gen­er­al­isa­tion in re­in­force­ment learn­ing. They eval­u­ate a range of stand­ard model-free al­gorithms on OpenAI Gym and Roboschool en­vir­on­ments; the ex­tent of gen­er­al­isa­tion is meas­ured by vary­ing en­vir­on­mental para­met­ers at test time (note that these tasks are in­ten­ded for al­gorithms which do not up­date at test time, un­like many trans­fer and multi-task learners). They dis­tin­guish between two forms of gen­er­al­isa­tion: in­ter­pol­a­tion (between val­ues seen dur­ing train­ing) and ex­tra­pol­a­tion (bey­ond them). The lat­ter, which is typ­ic­ally much harder for neural net­works, is meas­ured by set­ting en­vir­on­mental para­met­ers to more ex­treme val­ues in test­ing than in train­ing.

Richard’s opin­ion: I agree that hav­ing stand­ard bench­marks is of­ten use­ful for spur­ring pro­gress in deep learn­ing, and that this one will be use­ful. I’m some­what con­cerned that the tasks the au­thors have se­lec­ted (CartPole, HalfChee­tah, etc) are too simple, and that the prop­erty they’re meas­ur­ing is more like ro­bust­ness to peturb­a­tions than the sort of com­bin­at­or­ial gen­er­al­isa­tion dis­cussed in [this pa­per] (http://​​​​abs/​​1806.01261) from last week’s news­let­ter. The pa­per would be­ne­fit from more clar­ity about what they mean by “gen­er­al­isa­tion”.

Ef­fi­cient Eli­gib­il­ity Traces for Deep Rein­force­ment Learn­ing (Brett Da­ley et al)

Deep learning

In­tro­du­cing AdaNet: Fast and Flex­ible AutoML with Learn­ing Guar­an­tees (Charles Weill)

Learned op­tim­izers that out­per­form SGD on wall-clock and test loss (Luke Metz)

Un­su­per­vised learning

Toward an AI Phys­i­cist for Un­su­per­vised Learn­ing (Tailin Wu et al)

Hi­er­arch­ical RL

Neural Mod­u­lar Con­trol for Em­bod­ied Ques­tion An­swer­ing (Ab­hishek Das et al)


In­tro­du­cing the AI Align­ment Forum (FAQ) (habryka): Sum­mar­ized in the high­lights!