Alignment Newsletter #32

Link post

Re­mem­ber, treat all of the “se­quence” posts as though I had high­lighted them!

Highlights

Spin­ning Up in Deep RL (Joshua Achiam): OpenAI has re­leased an ed­u­ca­tional re­source aimed to help soft­ware en­g­ineers be­come skil­led at deep re­in­force­ment learn­ing. It in­cludes sim­ple im­ple­men­ta­tions of many deep RL al­gorithms (as op­posed to the rel­a­tively com­plex, highly op­ti­mized im­ple­men­ta­tions in Baselines), ed­u­ca­tional ex­er­cises, doc­u­men­ta­tion, and tu­to­ri­als. OpenAI will host a work­shop on the topic at their head­quar­ters on Feb 2nd, and are also plan­ning to hold a work­shop at CHAI some time in early 2019.

Ro­hin’s opinion: I know that a lot of effort has gone into this pro­ject, and I ex­pect that as a re­sult this is prob­a­bly the best ed­u­ca­tional re­source on deep RL out there. The main other re­source I know of is the deep RL boot­camp, which prob­a­bly sup­ple­ments this re­source nicely, es­pe­cially with the lec­tures (though it is a year out of date).

Tech­ni­cal AI alignment

Embed­ded agency sequence

Embed­ded World-Models (Abram Dem­ski and Scott Garrabrant): A few slides have been added to this post since my sum­mary last week, go­ing into more de­tail about the grain-of-truth prob­lem. This prob­lem is par­tic­u­larly hard be­cause your learned world model must in­clude the world model it­self in­side of it, even in the pres­ence of an en­vi­ron­ment that can be­have ad­ver­sar­i­ally to­wards the world model. It is easy to con­struct de­ter­minis­tic para­doxes where the world model can­not be cor­rect—for ex­am­ple, in rock-pa­per-scis­sors, if your model pre­dicts what the op­po­nent will do and plays the ac­tion that wins against the pre­dic­tion, the op­po­nent will (if they can) pre­dict that and play the ac­tion that beats your ac­tion, falsify­ing your model. While game the­ory solves these sorts of sce­nar­ios, it does so by split­ting the agent away from the en­vi­ron­ment, in a way that is very rem­i­nis­cent of the du­al­is­tic ap­proach. Re­cently, re­flec­tive or­a­cles were de­vel­oped, that solve this prob­lem by hav­ing prob­a­bil­is­tic mod­els that were ro­bust to self-refer­ence, but they still as­sume log­i­cal om­ni­science.

Sub­sys­tem Align­ment (Abram Dem­ski and Scott Garrabrant): Any agent is likely to be built out of mul­ti­ple sub­sys­tems, that could po­ten­tially have their own goals and work at cross-pur­poses to each other. A sim­ple un­re­al­is­tic ex­am­ple would be an agent com­posed of two parts—a world model and a de­ci­sion al­gorithm (akin to the setup in World Models (AN #23)). The de­ci­sion al­gorithm aims to cause some fea­ture of the world model to be high. In this case, the de­ci­sion al­gorithm could trick the world model into think­ing the fea­ture is high, in­stead of ac­tu­ally chang­ing the world so that the fea­ture is high (a delu­sion box).

Why not just build a mono­lithic agent, or build an agent whose sub­com­po­nents are all al­igned with each other? One rea­son is that our agent may want to solve prob­lems by split­ting into sub­goals. How­ever, what then pre­vents the agent from op­ti­miz­ing the sub­goal too far, to the point where it is no longer helps for the origi­nal goal? Another rea­son is that when we make sub­agents to solve sim­pler tasks, they shouldn’t need the whole con­text of what we value to do their task, and so we might give them a “poin­ter” to the true goal that they can use if nec­es­sary. But in that case, we have in­tro­duced a level of in­di­rec­tion, which a pre­vi­ous post (AN #31) ar­gues leads to wire­head­ing.

Per­haps the most in­sidious case is search, which can pro­duce sub­agents by ac­ci­dent. Often, it is eas­ier to solve a prob­lem by search­ing for a good solu­tion than de­riv­ing it from first prin­ci­ples. (For ex­am­ple, ma­chine learn­ing is a search over func­tions, and of­ten out­performs hand-de­signed pro­grams.) How­ever, when an agent searches for a good solu­tion, the solu­tion it finds might it­self be an agent op­ti­miz­ing some other goal that is cur­rently cor­re­lated with the origi­nal goal, but can di­verge later due to Good­hart’s law. If we op­ti­mize a neu­ral net for some loss func­tion, we might get such an in­ner op­ti­mizer. As an anal­ogy, if an agent wanted to max­i­mize re­pro­duc­tive fit­ness, they might have used evolu­tion to do this—but in that case hu­mans would be in­ner op­ti­miz­ers that sub­vert the origi­nal agent’s goals (since our goals are not to max­i­mize re­pro­duc­tive fit­ness).

Ro­hin’s opinion: The first part of this post seems to rest upon an as­sump­tion that any sub­agents will have long-term goals that they are try­ing to op­ti­mize, which can cause com­pe­ti­tion be­tween sub­agents. It seems pos­si­ble to in­stead pur­sue sub­goals un­der a limited amount of time, or us­ing a re­stricted ac­tion space, or us­ing only “nor­mal” strate­gies. When I write this newslet­ter, I cer­tainly am treat­ing it as a sub­goal—I don’t typ­i­cally think about how the newslet­ter con­tributes to my over­all goals, I just aim to write a good newslet­ter. Yet I don’t recheck ev­ery word un­til the email is sent. Per­haps this is be­cause that would be a new strat­egy I haven’t used be­fore and so I eval­u­ate it with my over­all goals, in­stead of just the “good newslet­ter” goal, or per­haps it’s be­cause my goal also has time con­straints em­bed­ded in it, or some­thing else, but in any case it seems wrong to think of newslet­ter-Ro­hin as op­ti­miz­ing long term prefer­ences for writ­ing as good a newslet­ter as pos­si­ble.

I agree quite strongly with the sec­ond part of the post, about in­ner op­ti­miz­ers that could arise from search. Agents that max­i­mize some long-term prefer­ences are cer­tainly pos­si­ble, and it seems rea­son­ably likely that a good solu­tion to a com­plex prob­lem would in­volve an op­ti­mizer that can ad­just to differ­ent cir­cum­stances (for con­crete­ness, per­haps imag­ine OpenAI Five (AN #13)). I don’t think that in­ner op­ti­miz­ers are guaran­teed to show up, but it seems quite likely, and they could lead to catas­trophic out­comes if they are left unchecked.

Embed­ded Cu­ri­osi­ties (Scott Garrabrant): This se­quence con­cludes with a brief note on why MIRI fo­cuses on em­bed­ded agency. While most re­search in this space is pre­sented from a mo­ti­va­tion of miti­gat­ing AI risk, Scott has pre­sented it more as an in­tel­lec­tual puz­zle, some­thing to be cu­ri­ous about. There aren’t clear, ob­vi­ous paths from the prob­lems of em­bed­ded agency to spe­cific failure modes. It’s more that the cur­rent du­al­is­tic way of think­ing about in­tel­li­gence will break down with smarter agents, and it seems bad if we are still rely­ing on these con­fused con­cepts when rea­son­ing about our AI sys­tems, and by de­fault it doesn’t seem like any­one will do the work of find­ing bet­ter con­cepts. For this work, it’s bet­ter to have a cu­ri­os­ity mind­set, which helps you ori­ent to­wards the things you are con­fused about. An in­stru­men­tal strat­egy ap­proach (which aims to di­rectly miti­gate failure modes) is vuln­er­a­ble to the urge to lean on the shaky as­sump­tions we cur­rently have in or­der to make progress.

Ro­hin’s opinion: I’m definitely on board with the idea of cu­ri­os­ity-driven re­search, it seems im­por­tant to try to find the places in which we’re con­fused and re­fine our knowl­edge about them. I think my main point of de­par­ture is that I am less con­fi­dent than (my per­cep­tion of) MIRI that there is a nice, clean for­mu­la­tion of em­bed­ded agents and in­tel­li­gence that you can write down—I wouldn’t be sur­prised if in­tel­li­gence was rel­a­tively en­vi­ron­ment-spe­cific. (This point was made in Real­ism about ra­tio­nal­ity (AN #25).) That said, I’m not par­tic­u­larly con­fi­dent about this and think there’s rea­son­able room for dis­agree­ment—cer­tainly I wouldn’t want to take ev­ery­one at MIRI and have them work on ap­pli­ca­tion-based AI al­ign­ment re­search.

Iter­ated am­plifi­ca­tion sequence

Pre­face to the se­quence on iter­ated am­plifi­ca­tion (Paul Chris­ti­ano): This is a pref­ace, read it if you’re go­ing to read the full posts, but not if you’re only go­ing to read these sum­maries.

Value learn­ing sequence

La­tent Vari­ables and Model Mis-Speci­fi­ca­tion (Ja­cob Stein­hardt): The key the­sis of this post is that when you use a prob­a­bil­is­tic model with la­tent vari­ables (also known as hid­den vari­ables, or the vari­ables whose vaues you don’t know), the val­ues in­ferred for those la­tent vari­ables may not have the in­tended mean­ing if the model is mis-speci­fied. For ex­am­ple, in in­verse re­in­force­ment learn­ing we use a prob­a­bil­is­tic model that pre­dicts the ob­served hu­man be­hav­ior from the la­tent util­ity func­tion, and we hope to re­cover the la­tent util­ity func­tion and op­ti­mize it.

A mis-speci­fied model is one in which there is no set­ting of the pa­ram­e­ters such that the re­sult­ing prob­a­bil­ity dis­tri­bu­tion matches the true dis­tri­bu­tion from which the data is sam­pled. For such a model, even in the limit of in­finite data, you are not go­ing to re­cover the true dis­tri­bu­tion. (This dis­t­in­guishes it from overfit­ting, which is not a prob­lem with in­finite data.) In this case, in­stead of the la­tent vari­ables tak­ing on the val­ues that we want (eg. in IRL, the true util­ity func­tion), they could be re­pur­posed to ex­plain parts of the dis­tri­bu­tion that can’t be ad­e­quately mod­eled (eg. in IRL, if you don’t ac­count for hu­mans learn­ing, you might re­pur­pose the util­ity func­tion pa­ram­e­ters to say that hu­mans like to change up their be­hav­ior a lot). If you then use the in­ferred la­tent vari­able val­ues, you’re go­ing to be in for a bad time.

So, un­der mis-speci­fi­ca­tion, the no­tion of the “true” value of la­tent vari­ables is no longer mean­ingful, and the dis­tri­bu­tion over la­tent vari­ables that you learn need not match re­al­ity. One po­ten­tial solu­tion would be coun­ter­fac­tual rea­son­ing, which in­for­mally means that your model must be able to make good pre­dic­tions on many differ­ent dis­tri­bu­tions.

Model Mis-speci­fi­ca­tion and In­verse Re­in­force­ment Learn­ing (Owain Evans and Ja­cob Stein­hardt): While the pre­vi­ous post fo­cused on mis-speci­fi­ca­tion in gen­eral, this one looks at in­verse re­in­force­ment learn­ing (IRL) in par­tic­u­lar. In IRL, the la­tent vari­able is the util­ity func­tion, which pre­dicts the ob­served vari­able, be­hav­ior. They iden­tify three main cat­e­gories where mis-speci­fi­ca­tion could harm IRL. First, IRL could mi­s­un­der­stand the ac­tions available to the hu­man. For ex­am­ple, if I ac­ci­den­tally hit some­one else due to a re­flex, but IRL doesn’t re­al­ize it’s a re­flex and thinks I could have cho­sen not to do that, it would in­fer I don’t like the other per­son. In ad­di­tion, in­fer­ring ac­tions is hard, since in many cases we would have to in­fer ac­tions from video frames, which is a challeng­ing ML prob­lem. Se­cond, IRL could mi­s­un­der­stand what in­for­ma­tion and bi­ases are available to the hu­man. If I go to a cafe when it is closed, but IRL thinks that I know it’s closed, it’s might in­cor­rectly in­fer a prefer­ence for tak­ing a walk. Similarly, if it doesn’t know about the plan­ning bias, it might in­fer that hu­mans don’t care about dead­lines. Third, IRL may not re­al­ize that hu­mans are mak­ing long-term plans, es­pe­cially if the data they are trained on is short and epi­sodic (a form of mis-speci­fi­ca­tion that seems quite likely). If you see a stu­dent study­ing all the time, you might in­fer that they like study­ing, in­stead of that they want a good grade. In­deed, this in­fer­ence prob­a­bly gets you 99% ac­cu­racy, since the stu­dent does in fact spend a lot of time study­ing. The gen­eral is­sue is that large changes in the model of the hu­man might only lead to small changes in pre­dic­tive ac­cu­racy, and this gets worse with longer-term plans.

Fu­ture di­rec­tions for am­bi­tious value learn­ing (Ro­hin Shah): This post is a sum­mary of many differ­ent re­search di­rec­tions re­lated to am­bi­tious value learn­ing that are cur­rently be­ing pur­sued.

Agent foundations

What are Univer­sal In­duc­tors, Again? (Diffrac­tor)

Learn­ing hu­man intent

Learn­ing from De­mon­stra­tion in the Wild (Feryal Be­hba­hani et al) (sum­ma­rized by Richard): This pa­per learns traf­fic tra­jec­to­ries from un­su­per­vised data by con­vert­ing traf­fic cam­era footage into a Unity scene simu­la­tion, us­ing that simu­la­tion to gen­er­ate pseudo-LIDAR read­ings for each “ex­pert tra­jec­tory”, and then train­ing an agent to imi­tate them us­ing a var­i­ant of gen­er­a­tive ad­ver­sar­ial imi­ta­tion learn­ing (GAIL).

Richard’s opinion: This is a cool ex­am­ple of how huge amounts of ex­ist­ing un­la­beled video data might be util­ised. The task they at­tempt is sig­nifi­cantly more com­plex than those in other similar work (such as this pa­per which learns to play Atari games from Youtube videos); how­ever, this also makes it difficult to judge how well the learned policy performed, and how much po­ten­tial it has to trans­fer into the real world.

Han­dling groups of agents

Multi-Agent Overop­ti­miza­tion, and Embed­ded Agent World Models (David Man­heim): This post and the as­so­ci­ated pa­per ar­gue for the com­plex­ity of mul­ti­a­gent set­tings, where you must build a model of how other agents act, even though they have mod­els of how you act. While game the­ory already deals with this set­ting, it only does so by as­sum­ing that the agents are perfectly ra­tio­nal, an as­sump­tion that doesn’t hold in prac­tice and doesn’t grap­ple with the fact that your model of the op­po­nent can­not be perfect. The pa­per lists a few failure modes. Ac­ci­den­tal steer­ing hap­pens when one agent takes ac­tion with­out the knowl­edge of what other agents are do­ing. Co­or­di­na­tion failures are ex­actly what they sound like. Ad­ver­sar­ial mis­al­ign­ment hap­pens when one agent chooses ac­tions to mis­lead a vic­tim agent into tak­ing ac­tions that benefit the first agent. In­put spoofing and fil­ter­ing hap­pen when one agent doc­tors the train­ing data for a vic­tim agent. Goal co-op­tion oc­curs when one agent takes con­trol over the other agent (pos­si­bly by mod­ify­ing their re­ward func­tion).

Ro­hin’s opinion: It’s great to see work on the mul­ti­a­gent set­ting! This set­ting does seem quite a bit more com­plex, and hasn’t been ex­plored very much from the AI safety stand­point. One ma­jor ques­tion I have is how this re­lates to the work already done in academia for differ­ent set­tings (typ­i­cally groups of hu­mans in­stead of AI agents). Quick takes on how each failure mode is re­lated to ex­ist­ing aca­demic work: Ac­ci­den­tal steer­ing is novel to me (but I wouldn’t be sur­prised if there has been work on it), co­or­di­na­tion failures seem like a par­tic­u­lar kind of (large scale) pris­oner’s dilemma, ad­ver­sar­ial mis­al­ign­ment is a spe­cial case of the prin­ci­pal-agent prob­lem, in­put spoofing and fil­ter­ing and goal co-op­tion seem like spe­cial cases of ad­ver­sar­ial mis­al­ign­ment (and are re­lated to ML se­cu­rity as the pa­per points out).

Interpretability

Ex­plain­ing Ex­pla­na­tions in AI (Brent Mit­tel­stadt et al)

Ad­ver­sar­ial examples

Is Ro­bust­ness [at] the Cost of Ac­cu­racy? (Dong Su, Huan Zhang et al) (sum­ma­rized by Dan H): This work shows that older ar­chi­tec­tures such as VGG ex­hibit more ad­ver­sar­ial ro­bust­ness than newer mod­els such as ResNets. Here they take ad­ver­sar­ial ro­bust­ness to be the av­er­age ad­ver­sar­ial per­tur­ba­tion size re­quired to fool a net­work. They use this to show that ar­chi­tec­ture choice mat­ters for ad­ver­sar­ial ro­bust­ness and that ac­cu­racy on the clean dataset is not nec­es­sar­ily pre­dic­tive of ad­ver­sar­ial ro­bust­ness. A sep­a­rate ob­ser­va­tion they make is that ad­ver­sar­ial ex­am­ples cre­ated with VGG trans­fers far bet­ter than those cre­ated with other ar­chi­tec­tures. All of these find­ings are for mod­els with­out ad­ver­sar­ial train­ing.

Ro­bust­ness May Be at Odds with Ac­cu­racy (Dimitris Tsipras, Shibani San­turkar, Lo­gan Engstrom et al) (sum­ma­rized by Dan H): Since ad­ver­sar­ial train­ing can markedly re­duce ac­cu­racy on clean images, one may ask whether there ex­ists an in­her­ent trade-off be­tween ad­ver­sar­ial ro­bust­ness and ac­cu­racy on clean images. They use a sim­ple model amenable to the­o­ret­i­cal anal­y­sis, and for this model they demon­strate a trade-off. In the sec­ond half of the pa­per, they show ad­ver­sar­ial train­ing can im­prove fea­ture vi­su­al­iza­tion, which has been shown in sev­eral con­cur­rent works.

Ad­ver­sar­ial Ex­am­ples Are a Nat­u­ral Con­se­quence of Test Er­ror in Noise (Anony­mous) (sum­ma­rized by Dan H): This pa­per ar­gues that there is a link be­tween model ac­cu­racy on noisy images and model ac­cu­racy on ad­ver­sar­ial images. They es­tab­lish this em­piri­cally by show­ing that aug­ment­ing the dataset with ran­dom ad­di­tive noise can im­prove ad­ver­sar­ial ro­bust­ness re­li­ably. To es­tab­lish this the­o­ret­i­cally, they use the Gaus­sian Isoper­i­met­ric Inequal­ity, which di­rectly gives a re­la­tion be­tween er­ror rates on noisy images and the me­dian ad­ver­sar­ial per­tur­ba­tion size. Given that mea­sur­ing test er­ror on noisy images is easy, given that claims about ad­ver­sar­ial ro­bust­ness are al­most always wrong, and given the re­la­tion be­tween ad­ver­sar­ial noise and ran­dom noise, they sug­gest that fu­ture defense re­search in­clude ex­per­i­ments demon­strat­ing en­hanced ro­bust­ness on non­ad­ver­sar­ial, noisy images.

Verification

MixTrain: Scal­able Train­ing of For­mally Ro­bust Neu­ral Net­works (Shiqi Wang et al)

Forecasting

AGI-11 Sur­vey (Justis Mills): A sur­vey of par­ti­ci­pants in the AGI-11 par­ti­ci­pants (with 60 re­spon­dents out of over 200 reg­is­tra­tions) found that 43% thought AGI would ap­pear be­fore 2030, 88% thought it would ap­pear be­fore 2100, and 85% be­lieved it would be benefi­cial for hu­mankind.

Ro­hin’s opinion: Note there’s a strong se­lec­tion effect, as AGI is a con­fer­ence speci­fi­cally aimed at gen­eral in­tel­li­gence.

Field building

Cur­rent AI Safety Roles for Soft­ware Eng­ineers (Ozzie Gooen): This post and its com­ments sum­ma­rize the AI safety roles available for soft­ware en­g­ineers (in­clud­ing ones that don’t re­quire ML ex­pe­rience).

Mis­cel­la­neous (Align­ment)

When does ra­tio­nal­ity-as-search have non­triv­ial im­pli­ca­tions? (nos­talge­braist): Many the­o­ries of ideal­ized in­tel­li­gence, such as Solomonoff in­duc­tion, log­i­cal in­duc­tors and Bayesian rea­son­ing, in­volve a large search over a space of strate­gies and us­ing the best-perform­ing one, or a weighted com­bi­na­tion where the weights de­pend on past perfor­mance. How­ever, the pro­ce­dure that in­volves the large search is not it­self part of the space of strate­gies—for ex­am­ple, Solomonoff in­duc­tion searches over the space of com­putable pro­grams to achieve near-op­ti­mal­ity at pre­dic­tion tasks rel­a­tive to any com­putable pro­gram, but is it­self un­com­putable. When we want to ac­tu­ally im­ple­ment a strat­egy, we have to choose one of the op­tions from our set, rather than the in­fea­si­ble ideal­ized ver­sion, and the ideal­ized ver­sion doesn’t help us do this. It would be like say­ing that a chess ex­pert is ap­prox­i­mat­ing the rule “con­sult all pos­si­ble chess play­ers weighted by past perfor­mance”—it’s true that these will look similar be­hav­iorally, but they look very differ­ent al­gorith­mi­cally, which is what we ac­tu­ally care about for build­ing sys­tems.

Ro­hin’s opinion: I do agree that in the frame­work out­lined in this post (the “ideal” be­ing just a search over “fea­si­ble” strate­gies) the ideal solu­tion doesn’t give you much in­sight, but I don’t think this is fully true of eg. Bayes rule. I do think that un­der­stand­ing Bayes rule can help you make bet­ter de­ci­sions, be­cause it gives you a quan­ti­ta­tive frame­work of how to work with hy­pothe­ses and ev­i­dence, which even sim­ple fea­si­ble strate­gies can use. (Although I do think that log­i­cally-om­ni­scient Bayes does not add much over reg­u­lar Bayes rule from the per­spec­tive of sug­gest­ing a fea­si­ble strat­egy to use—but in the world where log­i­cally-om­ni­scient Bayes came first, it would have been helpful to de­rive the heuris­tic.) In the frame­work of the post, this cor­re­sponds to the choice of “weight” as­signed to each hy­poth­e­sis, and this is use­ful be­cause fea­si­ble strate­gies do still look like search (but in­stead of search­ing over all hy­pothe­ses, you search over a very re­stricted sub­set of them). So over­all I think I agree with the gen­eral thrust of the post, but don’t agree with the origi­nal strong claim that ‘grap­pling with em­bed­ded­ness prop­erly will in­evitably make the­o­ries of this gen­eral type ir­rele­vant or use­less, so that “a the­ory like this, ex­cept for em­bed­ded agents” is not a thing that we can rea­son­ably want’.

Beliefs at differ­ent timescales (Nisan)

Near-term concerns

Pri­vacy and security

A Ma­rauder’s Map of Se­cu­rity and Pri­vacy in Ma­chine Learn­ing (Ni­co­las Paper­not)

AI strat­egy and policy

The Vuln­er­a­ble World Hy­poth­e­sis (Nick Bostrom) (sum­ma­rized by Richard): Bostrom con­sid­ers the pos­si­bil­ity “that there is some level of tech­nol­ogy at which civ­i­liza­tion al­most cer­tainly gets de­stroyed un­less quite ex­traor­di­nary and his­tor­i­cally un­prece­dented de­grees of pre­ven­tive polic­ing and/​or global gov­er­nance are im­ple­mented.” We were lucky, for ex­am­ple, that start­ing a nu­clear chain re­ac­tion re­quired difficult-to-ob­tain plu­to­nium or ura­nium, in­stead of eas­ily-available ma­te­ri­als. In the lat­ter case, our civil­i­sa­tion would prob­a­bly have fallen apart, be­cause it was (and still is) in the “semi-an­ar­chic de­fault con­di­tion”: we have limited ca­pac­ity for pre­ven­ta­tive polic­ing or global gov­ernence, and peo­ple have a di­verse range of mo­ti­va­tions, many self­ish and some de­struc­tive. Bostrom iden­ti­fies four types of vuln­er­a­bil­ity which vary by how eas­ily and widely the dan­ger­ous tech­nol­ogy can be pro­duced, how pre­dictable its effects are, and how strong the in­cen­tives to use it are. He also iden­i­tifies four pos­si­ble ways of sta­bil­is­ing the situ­a­tion: re­strict tech­nolog­i­cal de­vel­op­ment, in­fluence peo­ple’s mo­ti­va­tions, es­tab­lish effec­tive pre­ven­ta­tive polic­ing, and es­tab­lish effec­tive global gov­er­nance. He ar­gues that the lat­ter two are more promis­ing in this con­text, al­though they in­crease the risks of to­tal­i­tar­i­anism. Note that Bostrom doesn’t take a strong stance on whether the vuln­er­a­ble world hy­poth­e­sis is true, al­though he claims that it’s un­jus­tifi­able to have high cre­dence in its falsity.

Richard’s opinion: This is an im­por­tant pa­per which I hope will lead to much more anal­y­sis of these ques­tions.

Other progress in AI

Exploration

Contin­gency-Aware Ex­plo­ra­tion in Re­in­force­ment Learn­ing (Jong­wook Choi, Yijie Guo, Marcin Moczul­ski et al)

Re­in­force­ment learning

Spin­ning Up in Deep RL (Joshua Achiam): Sum­ma­rized in the high­lights!

Are Deep Policy Gra­di­ent Al­gorithms Truly Policy Gra­di­ent Al­gorithms? (An­drew Ilyas, Lo­gan Engstrom et al) (sum­ma­rized by Richard): This pa­per ar­gues that policy gra­di­ent al­gorithms are very de­pen­dent on ad­di­tional op­ti­mi­sa­tions (such as value func­tion clip­ping, re­ward scal­ing, etc), and that they op­er­ate with poor es­ti­mates of the gra­di­ent. It also demon­strates that the PPO ob­jec­tive is un­able to en­force a trust re­gion, and that the al­gorithm’s em­piri­cal suc­cess at do­ing so is due to the ad­di­tional op­ti­mi­sa­tions.

Richard’s opinion: While the work in this pa­per is solid, the con­clu­sions don’t seem par­tic­u­larly sur­pris­ing: ev­ery­one knows that deep RL is in­cred­ibly sam­ple in­ten­sive (which straight­for­wardly im­plies in­ac­cu­rate gra­di­ent es­ti­mates) and re­lies on many im­ple­men­ta­tion tricks. I’m not fa­mil­iar enough with PPO to know how sur­pris­ing their last re­sult is.

Plan On­line, Learn Offline: Effi­cient Learn­ing and Ex­plo­ra­tion via Model-Based Con­trol (Ken­dall Lowrey, Aravind Ra­jeswaran et al)

VIREL: A Vari­a­tional In­fer­ence Frame­work for Re­in­force­ment Learn­ing (Matthew Fel­lows, Anuj Ma­ha­jan et al)

Learn­ing Shared Dy­nam­ics with Meta-World Models (Lisheng Wu, Minne Li et al)

Deep learning

Open Sourc­ing BERT: State-of-the-Art Pre-train­ing for Nat­u­ral Lan­guage Pro­cess­ing (Ja­cob Devlin and Ming-Wei Chang)

Learn­ing Con­cepts with En­ergy Func­tions (Igor Mor­datch)

AGI theory

A Model for Gen­eral In­tel­li­gence (Paul Ya­worsky)

No comments.