Why I am not currently working on the AAMLS agenda

(note: this is not an offi­cial MIRI state­ment, this is a per­sonal state­ment. I am not speak­ing for oth­ers who have been in­volved with the agenda.)

The AAMLS (Align­ment for Ad­vanced Ma­chine Learn­ing Sys­tems) agenda is a pro­ject at MIRI that is about de­ter­min­ing how to use hy­po­thet­i­cal highly ad­vanced ma­chine learn­ing sys­tems safely. I was pre­vi­ously work­ing on prob­lems in this agenda and am cur­rently not.

The agenda

See the pa­per. The agenda lists 8 the­o­ret­i­cal prob­lems rele­vant to al­ign­ing AI sys­tems sub­stan­tially similar to cur­rent ma­chine learn­ing sys­tems.


Around March 2016, I had thoughts about re­search pri­ori­ti­za­tion: I thought it made sense to AI safety re­searchers spend more time think­ing about ma­chine learn­ing sys­tems. In a similar timeframe, some other re­searchers up­dated to­wards shorter timelines. I had some dis­cus­sions with Eliezer, Paul, Nate, and oth­ers, and came up with a list of prob­lems that seemed use­ful to think about.

Then some of us (mostly me, with sig­nifi­cant help from oth­ers) wrote up the pa­per about the prob­lems. The plan was for some sub­set of the re­searchers to work on them.

Progress since the paper

Since writ­ing the pa­per, progress has been slow:

  • I had con­crete thoughts about in­duc­tive am­bi­guity iden­ti­fi­ca­tion (with Ryan’s help); some of this is writ­ten up on the fo­rum here, here, here, here. Ret­ro­spec­tively, this line of think­ing seems like a dead end, though I’m not highly con­fi­dent of this judg­ment.

  • Some re­searchers and I have thought about many of the prob­lems and gained a slightly im­proved con­cep­tu­al­iza­tion of them, but this im­proved con­cep­tu­al­iza­tion is still quite vague and hasn’t led to con­crete progress to­wards solu­tions.

Why was lit­tle progress made?


I think the main rea­son is that the prob­lems were very difficult. In par­tic­u­lar, they were mostly se­lected on the ba­sis of “this seems im­por­tant and seems plau­si­bly solve­able”, rather than any strong in­tu­ition that it’s pos­si­ble to make progress.

In com­par­i­son, prob­lems in the agent foun­da­tions agenda have seen more progress:

  • Log­i­cal un­cer­tainty (Defin­abil­ity of truth, re­flec­tive or­a­cles, log­i­cal in­duc­tors)

  • De­ci­sion the­ory (Mo­dal UDT, re­flec­tive or­a­cles, log­i­cal in­duc­tors)

  • Vingean re­flec­tion (Model poly­mor­phism, log­i­cal in­duc­tors)

One thing to note about these prob­lems is that they were for­mu­lated on the ba­sis of a strong in­tu­ition that they ought to be solve­able. Be­fore log­i­cal in­duc­tion, it was pos­si­ble to have the in­tu­ition that some sort of asymp­totic ap­proach could solve many log­i­cal un­cer­tainty prob­lems in the limit. It was also pos­si­ble to strongly think that some sort of self-trust is pos­si­ble.

With prob­lems in the AAMLS agenda, the plau­si­bil­ity ar­gu­ment was some­thing like:

  • Here’s an ex­ist­ing, flawed ap­proach to the prob­lem (e.g. us­ing a re­in­force­ment sig­nal for en­vi­ron­men­tal goals, or mod­ifi­ca­tions of this ap­proach)

  • Here’s a vague in­tu­ition about why it’s pos­si­ble to do bet­ter (e.g. hu­mans do a differ­ent thing)

which, em­piri­cally, turned out not to make for tractable re­search prob­lems.

Go­ing for the throat

In an im­por­tant sense, the AAMLS agenda is “go­ing for the throat” in a way that other agen­das (e.g. the agent foun­da­tions agenda) are to a lesser ex­tent: it is at­tempt­ing to solve the whole al­ign­ment prob­lem (in­clud­ing goal speci­fi­ca­tion) given ac­cess to re­sources such as pow­er­ful re­in­force­ment learn­ing. Thus, the difficul­ties of the whole al­ign­ment prob­lem (e.g. speci­fi­ca­tion of en­vi­ron­men­tal goals) are more ex­posed in the prob­lems.

The­ory vs. empiricism

Per­son­ally, I strongly lean to­wards prefer­ring the­o­ret­i­cal rather than em­piri­cal ap­proaches. I don’t know how much I en­dorse this bias over­all for the set of peo­ple work­ing on AI safety as a whole, but it is definitely a per­sonal bias of mine.

Prob­lems in the AAMLS agenda turned out not to be very amenable to purely-the­o­ret­i­cal in­ves­ti­ga­tion. This is prob­a­bly due to the fact that there is not a clear math­e­mat­i­cal aes­thetic for de­ter­min­ing what counts as a solu­tion (e.g. for the en­vi­ron­men­tal goals prob­lem, it’s not ac­tu­ally clear that there’s a rec­og­niz­able math­e­mat­i­cal state­ment for what the prob­lem is).

With the agent foun­da­tions agenda, there’s a clearer aes­thetic for rec­og­niz­ing good solu­tions. Most of the prob­lems in the AAMLS agenda have a less-clear aes­thetic. (There are prob­a­bly ad­di­tional ways of in­ves­ti­gat­ing the AI al­ign­ment prob­lem in a highly aes­thetic fash­ion other than the agent foun­da­tions agenda, but I don’t know of them yet).

Do­ing other things

Per­haps re­lated to the fact that the prob­lems were so hard, I re­peat­edly found other things to feel bet­ter to think about and work on than AAMLS:

  • Log­i­cal in­duc­tion (math re­lated to it, and the pa­per) (around Septem­ber 2016)

  • Think­ing about why Paul and Eliezer dis­agree; some thoughts writ­ten up here and here (Novem­ber-De­cem­ber 2016)

  • The be­nign in­duc­tion prob­lem and weird philos­o­phy re­lated to it (Jan­uary-Fe­bru­rary 2016)

  • So­cial episte­mol­ogy and strat­egy (Fe­bru­rary-April 2016)

That is, though I was offi­cially lead on AAMLS, I mostly did other things in that time pe­riod. I think this was mostly cor­rect (though un­for­tu­nately made the offi­cial story some­what mis­lead­ing): I in­tu­itively ex­pect that the other things I did had a greater pay­off than work­ing on AAMLS would have.

Rele­vant up­dates I’ve made

I’ve made some up­dates (some due to AAMLS, some not) that make AAMLS look like a worse idea now than be­fore.

Against plau­si­bil­ity arguments

As dis­cussed be­fore, I in­cluded prob­lems based on plau­si­bil­ity rather than a strong in­tu­ition that the prob­lem is solve­able. I’ve up­dated against this be­ing a use­ful re­search strat­egy; I think strong in­tu­itions about things be­ing solve­able is a bet­ter guide as to what to work on. Note that strong in­tu­itions can be mis­cal­ibrated; how­ever, even in these cases there is still a strong model be­hind the in­tu­ition that can be tested by pur­su­ing the re­search im­plied by the in­tu­tiion.

In fa­vor of lots of philo­soph­i­cal hardness

I’ve up­dated in fa­vor of the propo­si­tion that es­sen­tial AI safety prob­lems (es­pe­cially those re­lated to be­nign in­duc­tion, bounded log­i­cal un­cer­tainty, and en­vi­ron­men­tal goals) are philo­soph­i­cally hard rather than only math­e­mat­i­cally hard. That is: just tak­ing our cur­rent philo­soph­i­cal think­ing and at­tempt­ing to for­mal­ize it will fail, be­cause our cur­rent philo­soph­i­cal think­ing is con­fused.

The main rea­son for this in­tu­ition is think­ing about these prob­lems for a sig­nifi­cant time and then notic­ing that, in near mode, I don’t ex­pect to be able to find satis­fy­ing solu­tions (e.g. a par­tic­u­lar thing and a math­e­mat­i­cal proof re­lated to the thing that yields high con­fi­dence it will work; it’s hard to imag­ine what the premises or con­clu­sions of the math­e­mat­i­cal proof would be). So it looks like large on­tolog­i­cal shifts will be nec­es­sary to even get to the stage of pick­ing the right prob­lems to for­mal­ize and solve.

Against par­tic­u­lar agendas

I’ve moved to­wards a re­search ap­proach that is less “rigid” than work­ing on a par­tic­u­lar agenda. Every par­tic­u­lar re­search agenda for AI al­ign­ment that I know of (agent foun­da­tions, AAMLS, con­crete prob­lems in AI safety, Paul’s agenda) offers a use­ful per­spec­tive on the prob­lem, but is quite limited in it­self. Each agenda does some com­bi­na­tion of (a) con­tain­ing “im­pos­si­ble” prob­lems, or (b) ig­nor­ing large parts of the AI safety prob­lem. If the over­all al­ign­ment prob­lem is solved, it will prob­a­bly be solved through re­searchers ob­tain­ing new, not-cur­rently-ex­ist­ing per­spec­tives on the prob­lem.

In gen­eral I think the pur­pose of tech­ni­cal agen­das is some­thing like:

  • offer­ing prob­lems for peo­ple to puz­zle over (this can be a good in­tro­duc­tion to AI al­ign­ment)

  • offer­ing a use­ful per­spec­tive on the prob­lem (break­ing it into some set of sub­prob­lems, such that the break­ing-up re­veals some­thing im­por­tant)

  • con­tain­ing tractable prob­lems at least some­what re­lated to the over­all al­ign­ment prob­lem (such that the view of the over­all prob­lem changes af­ter solv­ing one of the agenda prob­lems)

Against re­search be­ing op­ti­mized for out­side understandability

I’ve up­dated against the idea that re­search should be sig­nifi­cantly op­ti­mized for be­ing un­der­stand­able to out­siders. (I pre­vi­ously con­sid­ered un­der­stand­abil­ity a sig­nifi­cant point in fa­vor of work­ing on AAMLS but not one of the main con­sid­er­a­tions). The in­tu­itions in fa­vor of this type of re­search are fairly ob­vi­ous:

  • it can get more peo­ple to work on AI safety

  • it can re­sult in get­ting more so­cial credit (e.g. money, pres­tige) for research

I now have ad­di­tional in­tu­itions against:

  • dis­cern­ment abil­ity: peo­ple who aren’t al­ign­ment re­searchers have less abil­ity to dis­cern good re­search from bad, so re­quiring re­search to be un­der­stand­able to them cre­ates lo­cal pres­sures in fa­vor of worse re­search. Fur­ther­more, there’s a nar­ra­tive force in fa­vor of con­fus­ing “re­search good ac­cord­ing to peo­ple with low dis­cern­ment abil­ity” with “re­search that’s ac­tu­ally good”.

  • “main­stream” episte­mol­ogy be­ing cor­rupt: see my post on this topic.

Over­all it still seems like out­side un­der­stand­abil­ity is weakly net-pos­i­tive, but I don’t plan to use it as a sig­nifi­cant op­ti­miza­tion crite­rion when de­cid­ing which re­search to do (i.e. I’ll aim to just do re­search good ac­cord­ing to my aes­thet­ics and then figure out how to make it un­der­stand­able later).

The cur­rent state of the agenda

  • I would still recom­mend the pa­per to peo­ple. I think, for some­one who hasn’t spent a lot of time think­ing about AI safety, it is helpful to have lists of prob­lems and ap­proaches to them to think about. The agenda con­veys a cer­tain style of think­ing about AI al­ign­ment that I think is valuable (though turned out to be difficult to de­velop on).

  • I am con­tin­u­ing to think about how to use ML sys­tems to safely do things to re­duce ex­is­ten­tial risk, and am us­ing ML ab­strac­tions to think about AI al­ign­ment in gen­eral, with­out fo­cus­ing on spe­cific agenda prob­lems as much. I think this is use­ful.

  • I think peo­ple from a ma­chine learn­ing back­ground who want to think about AI al­ign­ment should start by think­ing about prob­lems in­clud­ing those on Paul’s re­search path, the AAMLS agenda, the Con­crete Prob­lems in AI Safety agenda, and the Agent Foun­da­tions Agenda, but should ad­di­tion­ally be aiming to get their own in­side view about how to solve the over­all al­ign­ment prob­lem.