Alignment Newsletter #31

Link post


In­tro­duc­ing the AI Align­ment Fo­rum (FAQ) (habryka): The Align­ment Fo­rum has offi­cially launched! It aims to be the sin­gle on­line hub for re­searchers to have con­ver­sa­tions about all the ideas in the field, while also helping new re­searchers get up to speed. While post­ing is re­stricted to mem­bers, all con­tent is cross-posted to LessWrong, where any­one can en­gage with it. In ad­di­tion, for the next few weeks there will be a daily post from one of three new se­quences on em­bed­ded agency, iter­ated am­plifi­ca­tion, and value learn­ing.

Ro­hin’s opinion: I’m ex­cited for this fo­rum, and will be col­lat­ing the value learn­ing se­quence for its launch. Since these se­quences are meant to teach some of the key ideas in AI al­ign­ment, I would prob­a­bly end up high­light­ing ev­ery sin­gle post. In­stead of that, I’m go­ing to cre­ate new cat­e­gories for each se­quence and sum­ma­rize them each week within the cat­e­gory, but you should treat them as if I had high­lighted them.

Re­in­force­ment Learn­ing with Pre­dic­tion-Based Re­wards (Yuri Burda and Harri Ed­wards) (sum­ma­rized by Richard): Re­searchers at OpenAI have beaten av­er­age hu­man perfor­mance on Mon­tezuma’s Re­venge us­ing a pre­dic­tion-based cu­ri­os­ity tech­nique called Ran­dom Net­work Distil­la­tion. A net­work with fixed ran­dom weights eval­u­ates each state; an­other net­work with the same ar­chi­tec­ture is trained to pre­dict the ran­dom net­work’s out­put, given its in­put. The agent re­ceives an ad­di­tional re­ward pro­por­tional to the pre­dic­tor’s er­ror on its cur­rent state. The idea be­hind the tech­nique is that the pre­dic­tor’s er­ror will be higher on states differ­ent from those it’s been trained on, and so the agent will be re­warded for ex­plor­ing them.

This pa­per fol­lows from their study on cu­ri­os­ity (AN #20) in which a pre­dic­tor was trained to pre­dict the next state di­rectly, and the agent was re­warded when its er­ror was high. How­ever, this led to high re­ward on states that were un­pre­dictable due to model limi­ta­tions or stochas­tic­ity (e.g. the noisy TV prob­lem). By con­trast, Ran­dom Net­work Distil­la­tion only re­quires the pre­dic­tion of a de­ter­minis­tic func­tion which is definitely within the class of func­tions rep­re­sentable by the pre­dic­tor (since it has the same ar­chi­tec­ture as the ran­dom net­work).

Richard’s opinion: This is an im­por­tant step for­ward for cu­ri­os­ity-driven agents. As the au­thors note in the pa­per, RND has the ad­di­tional ad­van­tages of be­ing sim­ple to im­ple­ment and flex­ible.

Tech­ni­cal AI alignment

Embed­ded agency sequence

Embed­ded Agents (Abram Dem­ski and Scott Garrabrant): This post in­tro­duces em­bed­ded agency, which refers to the no­tion of an “agent” that is more re­al­is­tic than the ver­sion con­sid­ered in main­stream AI, which is best for­mal­ized by AIXI. An em­bed­ded agent is one that is ac­tu­ally a part of the en­vi­ron­ment it is act­ing in, as op­posed to our cur­rent AI agents which model the en­vi­ron­ment as ex­ter­nal to them. The prob­lems around em­bed­ded agency fall into four main clusters, which fu­ture posts will talk about.

Ro­hin’s opinion: This post is a great sum­mary of the se­quence to come, and is in­tu­itive and easy to un­der­stand. I strongly recom­mend read­ing the full post—I haven’t sum­ma­rized it much be­cause it already is a good sum­mary.

De­ci­sion The­ory (Abram Dem­ski and Scott Garrabrant): The ma­jor is­sue with port­ing de­ci­sion the­ory to the em­bed­ded agency sec­tion is that there is no longer a clear, well-defined bound­ary be­tween ac­tions and out­comes, such that we can say “if I take this ac­tion, then this out­come oc­curs”. In an em­bed­ded set­ting, the agent is just an­other part of the en­vi­ron­ment, and so if the agent is rea­son­ing about the en­vi­ron­ment, it can also rea­son about it­self, and its rea­son­ing can tell it some­thing about what its ac­tions will be. But if you know what ac­tion you are go­ing to take, how do you prop­erly think about the coun­ter­fac­tual “what if I had taken this other ac­tion”?

A for­mal­iza­tion in logic, where coun­ter­fac­tu­als are rep­re­sented by log­i­cal im­pli­ca­tion, doesn’t work. If you know what your ac­tion is go­ing to be, then the premise of the coun­ter­fac­tual (that you take some other ac­tion) is false, and you can con­clude any­thing. The post gives a con­crete ex­am­ple of a rea­son­able-look­ing agent which ends up choos­ing to take $5 when offered a choice be­tween $5 and $10 be­cause it can prove that “if I took $10, then I would get $0” (which is in fact true, since it took $5, and not $10!) A for­mal­iza­tion in prob­a­bil­ity the­ory doesn’t work, be­cause if you con­di­tion on an al­ter­na­tive ac­tion that you know you won’t take, you are con­di­tion­ing on a prob­a­bil­ity zero event. If you say that there is always some un­cer­tainty in which ac­tion you take, or you force the agent to always ex­plore with some small prob­a­bil­ity, then your agent is go­ing to rea­son about al­ter­na­tive ac­tions un­der the as­sump­tion that there was some hard­ware failure, or that it was forced to ex­plore—this seems like the wrong way to rea­son about al­ter­na­tives.

Chang­ing tack a bit, how would we think about “What if 2+2=3?” This seems like a pretty hard coun­ter­fac­tual for us to eval­u­ate—it’s not clear what it means. There may just be no “cor­rect” coun­ter­fac­tu­als—but in this case we still need to figure out how in­tel­li­gent agents like hu­mans suc­cess­fully con­sider al­ter­na­tive ac­tions that they are not go­ing to take, in or­der to make good de­ci­sions. One ap­proach is Up­date­less De­ci­sion The­ory (UDT), which takes the ac­tion your ear­lier self would have wanted to com­mit to, which comes closer to view­ing the prob­lem from the out­side. While it neatly re­solves many of the prob­lems in de­ci­sion the­ory, in­clud­ing coun­ter­fac­tual mug­ging (de­scribed in the post), it as­sumes that your ear­lier self can fore­see all out­comes, which can’t hap­pen in em­bed­ded agents be­cause the en­vi­ron­ment is big­ger than the agent and any world model can only be ap­prox­i­mate (the sub­ject of the next post).

Ro­hin’s opinion: Warn­ing: Ram­blings about top­ics I haven’t thought about much.

I’m cer­tainly con­fused about how hu­mans ac­tu­ally make de­ci­sions—we do seem to be able to con­sider coun­ter­fac­tu­als in some rea­son­able way, but it does seem like these are rel­a­tively fuzzy (we can’t do the coun­ter­fac­tual “what if 2+2=3”, we can do the coun­ter­fac­tual “what if I took the $10″, and we dis­agree on how to do the coun­ter­fac­tual “what would hap­pen if we le­gal­ize drugs” (eg. do we as­sume that pub­lic opinion has changed or not?). This makes me feel pes­simistic about the goal of hav­ing a “cor­rect” coun­ter­fac­tual—it seems likely that hu­mans some­how build causal mod­els of some as­pects of the world (which do ad­mit good coun­ter­fac­tu­als), es­pe­cially of the ac­tions we can take, and not of oth­ers (like math), and dis­agree­ments on “cor­rect” coun­ter­fac­tu­als amount to dis­agree­ments on causal mod­els. Of course, this just pushes the ques­tion down to how we build causal mod­els—maybe we have an in­duc­tive bias that pushes us to­wards sim­ple causal mod­els, and the world just hap­pens to be the kind where the data you ob­serve con­strains your mod­els sig­nifi­cantly, such that ev­ery­one ends up in­fer­ring similar causal mod­els.

How­ever, if we do build some­thing like this, it seems hard to cor­rectly solve most de­ci­sion the­ory prob­lems that they con­sider, such as New­comblike prob­lems, at least if we use the in­tu­itive no­tion of causal­ity. Maybe this is okay, maybe not, I’m not sure. It definitely doesn’t feel like this is re­solv­ing my con­fu­sion about how to make good de­ci­sions in gen­eral, though I could imag­ine that it could re­solve my con­fu­sion about how to make good de­ci­sions in our ac­tual uni­verse (where causal­ity seems im­por­tant and “easy” to in­fer).

Embed­ded World-Models (Abram Dem­ski and Scott Garrabrant): In or­der to get op­ti­mal be­hav­ior on en­vi­ron­ments, you need to be able to model the en­vi­ron­ment in full de­tail, which an em­bed­ded agent can­not do. For ex­am­ple, AIXI is in­com­putable and gets op­ti­mal be­hav­ior on com­putable en­vi­ron­ments. If you use AIXI in an in­com­putable en­vi­ron­ment, it gets bounded loss on pre­dic­tive ac­cu­racy com­pared to any com­putable pre­dic­tor, but there are no re­sults on ab­solute loss on pre­dic­tive ac­cu­racy, or on the op­ti­mal­ity of ac­tions it chooses. In gen­eral, if the en­vi­ron­ment is not in the space of hy­pothe­ses you can con­sider, that is your en­vi­ron­ment hy­poth­e­sis space is mis­speci­fied, then many bad is­sues can arise (as of­ten hap­pens with mis­speci­fi­ca­tion). This is called the grain-of-truth prob­lem, so named be­cause you have to deal with the fact that your prior does not even have a grain of truth (the true en­vi­ron­ment hy­poth­e­sis).

One ap­proach could be to learn a small yet well-speci­fied model of the en­vi­ron­ment, such as the laws of physics, but not be able to com­pute all of the con­se­quences of that model. This gives rise to the prob­lem of log­i­cal un­cer­tainty, where you would like to have be­liefs about facts that can be de­duced or re­futed from facts you already know, but you lack the abil­ity to do this. This re­quires a unifi­ca­tion of logic and prob­a­bil­ity, which is sur­pris­ingly hard.

Another con­se­quence is that our agents will need to have high-level world mod­els—they need to be able to talk about things like chairs and ta­bles as atoms, rather than think­ing of ev­ery­thing as a quan­tum wave­func­tion. They will also have to deal with the fact that the high-level mod­els will of­ten con­flict with mod­els at lower lev­els, and that mod­els at any level could shift and change with­out any change to mod­els at other lev­els. An on­tolog­i­cal crisis oc­curs when there is a change in the level at which our val­ues are defined, such that it is not clear how to ex­trap­o­late our val­ues to the new model. An anal­ogy would be if our view of the world changed such that “hap­piness” no longer seemed like a co­her­ent con­cept.

As always, we also have prob­lems with self-refer­ence—nat­u­ral­ized in­duc­tion is the prob­lem of learn­ing a world model that in­cludes the agent, and an­thropic rea­son­ing re­quires you to figure out how many copies of your­self ex­ist in the world.

Ro­hin’s opinion: Warn­ing: Ram­blings about top­ics I haven’t thought about much.

The high-level and multi-level model prob­lems sound similar to the prob­lems that could arise with hi­er­ar­chi­cal re­in­force­ment learn­ing or hi­er­ar­chi­cal rep­re­sen­ta­tion learn­ing, though the em­pha­sis here is on the in­con­sis­ten­cies be­tween differ­ent lev­els rather than how to learn the model in the first place.

The grain of truth prob­lem is one of the prob­lems I am most con­fused about—in ma­chine learn­ing, model mis­speci­fi­ca­tion can lead to very bad re­sults, so it is not clear how to deal with this even ap­prox­i­mately in prac­tice. (Whereas with de­ci­sion the­ory, “ap­prox­i­mate in-prac­tice solu­tions” in­clude learn­ing causal mod­els on which you can con­struct coun­ter­fac­tu­als, or learn­ing from ex­pe­rience what sort of de­ci­sion­mak­ing al­gorithm tends to work well, and these solu­tions do not ob­vi­ously fail as you scale up.) If you learn enough to rule out all of your hy­pothe­ses, as could hap­pen with the grain of truth prob­lem, what do you do then? If you’re work­ing in a Bayesian frame­work, you end up go­ing with the hy­poth­e­sis you’ve dis­proven the least, which is prob­a­bly not go­ing to get you good re­sults. If you’re work­ing in logic, you get an er­ror. I guess learn­ing a model of the en­vi­ron­ment in model-based RL doesn’t ob­vi­ously fail if you scale up.

Ro­bust Del­e­ga­tion (Abram Dem­ski and Scott Garrabrant): Pre­sum­ably, we will want to build AI sys­tems that be­come more ca­pa­ble as time goes on, whether sim­ply by learn­ing more or by con­struct­ing a more in­tel­li­gent suc­ces­sor agent (i.e. self-im­prove­ment). In both cases, the agent would like to en­sure that its fu­ture self con­tinues to ap­ply its in­tel­li­gence in pur­suit of the same goals, a prob­lem known as Vingean re­flec­tion. The main is­sue is that the fu­ture agent is “big­ger” (more ca­pa­ble) than the cur­rent agent, and so the smaller agent can­not pre­dict it. In ad­di­tion, from the fu­ture agent’s per­spec­tive, the cur­rent agent may be ir­ra­tional, may not know what it wants, or could be made to look like it wants just about any­thing.

When con­struct­ing a suc­ces­sor agent, you face the value load­ing prob­lem, where you need to spec­ify what you want the suc­ces­sor agent to do, and you need to get it right be­cause op­ti­miza­tion am­plifies (AN #13) mis­takes, in par­tic­u­lar via Good­hart’s Law. There’s a dis­cus­sion of the types of Good­hart’s Law (also de­scribed in Good­hart Tax­on­omy). Another is­sue that arises in this set­ting is that the suc­ces­sor agent could take over the rep­re­sen­ta­tion of the re­ward func­tion and make it always out­put the max­i­mal value, a phe­nomenon called “wire­head­ing”, though this can be avoided if the agent’s plan to do this is eval­u­ated by the cur­rent util­ity func­tion.

One hope is to cre­ate the suc­ces­sor agent from the origi­nal agent through in­tel­li­gence am­plifi­ca­tion, along the lines of iter­ated am­plifi­ca­tion. How­ever, this re­quires the cur­rent small agent to be able to de­com­pose ar­bi­trary prob­lems, and to en­sure that its pro­posed de­com­po­si­tion doesn’t give rise to ma­lign sub­com­pu­ta­tions, a prob­lem to be de­scribed in the next post on sub­sys­tem al­ign­ment.

Ro­hin’s opinion: This is a lot closer to the prob­lem I think about fre­quently (since I fo­cus on the prin­ci­pal-agent prob­lem be­tween a hu­man and an AI) so I have a lot of thoughts about this, but they’d take a while to un­tan­gle and ex­plain. Hope­fully, a lot of these in­tu­itions will be writ­ten up in the sec­ond part of the value learn­ing se­quence.

Value learn­ing sequence

Pre­face to the Se­quence on Value Learn­ing (Ro­hin Shah): This is a pref­ace, read it if you’re go­ing to read the full posts, but not if you’re only go­ing to read these sum­maries.

What is am­bi­tious value learn­ing? (Ro­hin Shah): The speci­fi­ca­tion prob­lem is the prob­lem of defin­ing the be­hav­ior we want out of an AI sys­tem. If we use the com­mon model of a su­per­in­tel­li­gent AI max­i­miz­ing some ex­plicit util­ity func­tion, this re­duces to the prob­lem of defin­ing a util­ity func­tion whose op­ti­mum is achieved by be­hav­ior that we want. We know that our util­ity func­tion is too com­plex to write down (if it even ex­ists), but per­haps we can learn it from data about hu­man be­hav­ior? This is the idea be­hind am­bi­tious value learn­ing—to learn a util­ity func­tion from hu­man be­hav­ior that can be safely max­i­mized. Note that since we are tar­get­ing the speci­fi­ca­tion prob­lem, we only want to define the be­hav­ior, so we can as­sume in­finite com­pute, in­finite data, perfect max­i­miza­tion, etc.

The easy goal in­fer­ence prob­lem is still hard (Paul Chris­ti­ano): One con­crete way of think­ing about am­bi­tious value learn­ing is to think about the case where we have the full hu­man policy, that is, we know how a par­tic­u­lar hu­man re­sponds to all pos­si­ble in­puts (life ex­pe­riences, mem­o­ries, etc). In this case, it is still hard to in­fer a util­ity func­tion from the policy. If we in­fer a util­ity func­tion as­sum­ing that hu­mans are op­ti­mal, then an AI sys­tem that max­i­mizes this util­ity func­tion will re­cover hu­man be­hav­ior, but will not sur­pass it. In or­der to sur­pass hu­man perfor­mance, we need to ac­cu­rately model the mis­takes a hu­man makes, and cor­rect for them when in­fer­ring a util­ity func­tion. It’s not clear how to get this—the usual ap­proach in ma­chine learn­ing is to choose more ac­cu­rate mod­els, but in this case even the most ac­cu­rate model only gets us to hu­man imi­ta­tion.

Hu­mans can be as­signed any val­ues what­so­ever… (Stu­art Arm­strong): This post for­mal­izes the think­ing in the pre­vi­ous post. Since we need to model hu­man ir­ra­tional­ity in or­der to sur­pass hu­man perfor­mance, we can for­mal­ize the hu­man’s plan­ning al­gorithm p, which takes as in­put a re­ward or util­ity func­tion R, and pro­duces a policy pi = p(R). Within this for­mal­ism, we would like to in­fer p and R for a hu­man si­mul­ta­neously, and then op­ti­mize R alone. How­ever, the only con­straint we have is that p(R) = pi, and there are many pairs of p and R that work be­sides the “rea­son­able” p and R that we are try­ing to in­fer. For ex­am­ple, p could be ex­pected util­ity max­i­miza­tion and R could place re­ward 1 on the (his­tory, ac­tion) pairs in the policy and re­ward 0 on any pair not in the policy. And for ev­ery pair, we can define a new pair (-p, -R) which negates the re­ward, with (-p)(R) defined to be p(-R), that is the plan­ner negates the re­ward (re­turn­ing it to its origi­nal form) be­fore us­ing it. We could also have R = 0 and p be the con­stant func­tion that always out­puts the policy pi. All of these pairs re­pro­duce the hu­man policy pi, but if you throw away the plan­ner p and op­ti­mize the re­ward R alone, you will get very differ­ent re­sults. You might think that you could avoid this im­pos­si­bil­ity re­sult by us­ing a sim­plic­ity prior, but at least a Kol­mogorov sim­plic­ity prior barely helps.

Tech­ni­cal agen­das and prioritization

Dis­cus­sion on the ma­chine learn­ing ap­proach to AI safety (Vika) (sum­ma­rized by Richard): This blog post (based on a talk at EA Global Lon­don) dis­cusses whether cur­rent work on the ma­chine learn­ing ap­proach to AI safety will re­main rele­vant in the face of po­ten­tial paradig­matic changes in ML sys­tems. Vika and Jan rate how much they rely on each as­sump­tions in a list drawn from [this blog post by Jon Gau­thier] (http://​​​​2018/​​con­cep­tual-is­sues-ai-safety-paradig­matic-gap/​​) (AN #13), and how likely each as­sump­tions is to hold up over time. They also eval­u­ate ar­gu­ments for hu­man-in-the-loop ap­proaches ver­sus prob­lem-spe­cific ap­proaches.

Richard’s opinion: This post con­cisely con­veys a num­ber of Vika and Jan’s views, albeit with­out ex­pla­na­tions for most of them. I’d en­courage other safety re­searchers to do the same ex­er­cise, with a view to flesh­ing out the cruxes be­hind what­ever dis­agree­ments come up.

Learn­ing hu­man intent

BabyAI: First Steps Towards Grounded Lan­guage Learn­ing With a Hu­man In the Loop (Maxime Che­va­lier-Boisvert, Dzmitry Bah­danau et al): See Im­port AI.

One-Shot Hier­ar­chi­cal Imi­ta­tion Learn­ing of Com­pound Vi­suo­mo­tor Tasks (Ti­anhe Yu et al)

Effi­ciently Com­bin­ing Hu­man De­mon­stra­tions and In­ter­ven­tions for Safe Train­ing of Au­tonomous Sys­tems in Real-Time (Vini­cius G. Goecks et al)

In­verse re­in­force­ment learn­ing for video games (Aaron Tucker et al)

Han­dling groups of agents

In­trin­sic So­cial Mo­ti­va­tion via Causal In­fluence in Multi-Agent RL (Natasha Jaques et al)


On the Effec­tive­ness of In­ter­val Bound Prop­a­ga­tion for Train­ing Ver­ifi­ably Ro­bust Models (Sven Gowal, Kr­ish­na­murthy Dvijotham, Robert Stan­forth et al)

Field building

The fastest way into a high-im­pact role as a ma­chine learn­ing en­g­ineer, ac­cord­ing to Cather­ine Ols­son & Daniel Zie­gler (Cather­ine Ols­son, Daniel Zie­gler, and Rob Wiblin) (sum­ma­rized by Richard): Cather­ine and Daniel both started PhDs, but left to work on AI safety (they’re cur­rently at Google Brain and OpenAI re­spec­tively). They note that AI safety teams need re­search en­g­ineers to do im­ple­men­ta­tion work, and that tal­ented pro­gram­mers can pick up the skills re­quired within a few months, with­out need­ing to do a PhD. The dis­tinc­tion be­tween re­search en­g­ineers and re­search sci­en­tists is fairly fluid—while re­search en­g­ineers usu­ally work un­der the di­rec­tion of a re­search sci­en­tist, they of­ten do similar things.

Their ad­vice on de­vel­op­ing the skills needed to get into good re­search roles is not to start with a broad the­o­ret­i­cal fo­cus, but rather to dive straight into the de­tails. Read and reim­ple­ment im­por­tant pa­pers, to de­velop tech­ni­cal ML ex­per­tise. Find spe­cific prob­lems rele­vant to AI safety that you’re par­tic­u­larly in­ter­ested in, figure out what skills they re­quire, and fo­cus on those. They also ar­gue that even if you want to even­tu­ally do a PhD, get­ting prac­ti­cal ex­pe­rience first is very use­ful, both tech­ni­cally and mo­ti­va­tion­ally. While they’re glad not to have finished their PhDs, do­ing one can provide im­por­tant men­tor­ship.

This is a long pod­cast and there’s also much more dis­cus­sion of ob­ject-level AI safety ideas, albeit mostly at an in­tro­duc­tory level.

Richard’s opinion: Any­one who wants to get into AI safety (and isn’t already an AI re­searcher) should listen to this pod­cast—there’s a lot of use­ful in­for­ma­tion in it and this ca­reer tran­si­tion guide. I agree that hav­ing more re­search en­g­ineers is very valuable, and that it’s a rel­a­tively easy tran­si­tion for peo­ple with CS back­grounds to make. (I may be a lit­tle bi­ased on this point, though, since it’s also the path I’m cur­rently tak­ing.)

I think the is­sue of PhDs and men­tor­ship is an im­por­tant and com­pli­cated one. The field of AI safety is cur­rently bot­tle­necked to a sig­nifi­cant ex­tent by the availa­bil­ity of men­tor­ship, and so even a ML PhD un­re­lated to safety can still be very valuable if it teaches you how to do good in­de­pen­dent re­search and su­per­vise oth­ers, with­out re­quiring the time of cur­rent safety re­searchers. Also note that the trade-offs in­volve vary quite a bit. In par­tic­u­lar, Euro­pean PhDs can be sig­nifi­cantly shorter than US ones; and the one-year Masters de­grees available in the UK are a quick and easy way to tran­si­tion into re­search en­g­ineer­ing roles.

Read more: Con­crete next steps for tran­si­tion­ing from CS or soft­ware en­g­ineer­ing into ML en­g­ineer­ing for AI safety and alignment

Other progress in AI


Re­in­force­ment Learn­ing with Pre­dic­tion-Based Re­wards (Yuri Burda and Harri Ed­wards): Sum­ma­rized in the high­lights!

Re­in­force­ment learning

Assess­ing Gen­er­al­iza­tion in Deep Re­in­force­ment Learn­ing (Charles Packer, Katelyn Gao et al) (sum­ma­rized by Richard): This pa­per aims to cre­ate a bench­mark for mea­sur­ing gen­er­al­i­sa­tion in re­in­force­ment learn­ing. They eval­u­ate a range of stan­dard model-free al­gorithms on OpenAI Gym and Ro­boschool en­vi­ron­ments; the ex­tent of gen­er­al­i­sa­tion is mea­sured by vary­ing en­vi­ron­men­tal pa­ram­e­ters at test time (note that these tasks are in­tended for al­gorithms which do not up­date at test time, un­like many trans­fer and multi-task learn­ers). They dis­t­in­guish be­tween two forms of gen­er­al­i­sa­tion: in­ter­po­la­tion (be­tween val­ues seen dur­ing train­ing) and ex­trap­o­la­tion (be­yond them). The lat­ter, which is typ­i­cally much harder for neu­ral net­works, is mea­sured by set­ting en­vi­ron­men­tal pa­ram­e­ters to more ex­treme val­ues in test­ing than in train­ing.

Richard’s opinion: I agree that hav­ing stan­dard bench­marks is of­ten use­ful for spurring progress in deep learn­ing, and that this one will be use­ful. I’m some­what con­cerned that the tasks the au­thors have se­lected (CartPole, HalfChee­tah, etc) are too sim­ple, and that the prop­erty they’re mea­sur­ing is more like ro­bust­ness to petur­ba­tions than the sort of com­bi­na­to­rial gen­er­al­i­sa­tion dis­cussed in [this pa­per] (http://​​​​abs/​​1806.01261) from last week’s newslet­ter. The pa­per would benefit from more clar­ity about what they mean by “gen­er­al­i­sa­tion”.

Effi­cient Eligi­bil­ity Traces for Deep Re­in­force­ment Learn­ing (Brett Daley et al)

Deep learning

In­tro­duc­ing AdaNet: Fast and Flex­ible Au­toML with Learn­ing Guaran­tees (Charles Weill)

Learned op­ti­miz­ers that out­perform SGD on wall-clock and test loss (Luke Metz)

Un­su­per­vised learning

Toward an AI Physi­cist for Un­su­per­vised Learn­ing (Tailin Wu et al)

Hier­ar­chi­cal RL

Neu­ral Mo­du­lar Con­trol for Em­bod­ied Ques­tion An­swer­ing (Ab­hishek Das et al)


In­tro­duc­ing the AI Align­ment Fo­rum (FAQ) (habryka): Sum­ma­rized in the high­lights!

No nominations.
No reviews.
No comments.