Google Deepmind and FHI collaborate to present research at UAI 2016

Safely In­ter­rupt­ible Agents

Oxford aca­demics are team­ing up with Google Deep­Mind to make ar­tifi­cial in­tel­li­gence safer. Lau­rent Orseau, of Google Deep­Mind, and Stu­art Arm­strong, the Alexan­der Ta­mas Fel­low in Ar­tifi­cial In­tel­li­gence and Ma­chine Learn­ing at the Fu­ture of Hu­man­ity In­sti­tute at the Univer­sity of Oxford, will be pre­sent­ing their re­search on re­in­force­ment learn­ing agent in­ter­rupt­ibil­ity at UAI 2016. The con­fer­ence, one of the most pres­ti­gious in the field of ma­chine learn­ing, will be held in New York City from June 25-29. The pa­per which re­sulted from this col­lab­o­ra­tive re­search will be pub­lished in the Pro­ceed­ings of the 32nd Con­fer­ence on Uncer­tainty in Ar­tifi­cial In­tel­li­gence (UAI).

Orseau and Arm­strong’s re­search ex­plores a method to en­sure that re­in­force­ment learn­ing agents can be re­peat­edly safely in­ter­rupted by hu­man or au­to­matic over­seers. This en­sures that the agents do not “learn” about these in­ter­rup­tions, and do not take steps to avoid or ma­nipu­late the in­ter­rup­tions. When there are con­trol pro­ce­dures dur­ing the train­ing of the agent, we do not want the agent to learn about these pro­ce­dures, as they will not ex­ist once the agent is on its own. This is use­ful for agents that have a sub­stan­tially differ­ent train­ing and test­ing en­vi­ron­ment (for in­stance, when train­ing a Mar­tian rover on Earth, shut­ting it down, re­plac­ing it at its ini­tial lo­ca­tion and turn­ing it on again when it goes out of bounds—some­thing that may be im­pos­si­ble once alone un­su­per­vised on Mars), for agents not known to be fully trust­wor­thy (such as an au­to­mated de­liv­ery ve­hi­cle, that we do not want to learn to be­have differ­ently when watched), or sim­ply for agents that need con­tinual ad­just­ments to their learnt be­havi­our. In all cases where it makes sense to in­clude an emer­gency “off” mechanism, it also makes sense to en­sure the agent doesn’t learn to plan around that mechanism.

In­ter­rupt­ibil­ity has sev­eral ad­van­tages as an ap­proach over pre­vi­ous meth­ods of con­trol. As Dr. Arm­strong ex­plains, “In­ter­rupt­ibil­ity has ap­pli­ca­tions for many cur­rent agents, es­pe­cially when we need the agent to not learn from spe­cific ex­pe­riences dur­ing train­ing. Many of the naive ideas for ac­com­plish­ing this—such as delet­ing cer­tain his­to­ries from the train­ing set—change the be­havi­our of the agent in un­for­tu­nate ways.”

In the pa­per, the re­searchers provide a for­mal defi­ni­tion of safe in­ter­rupt­ibil­ity, show that some types of agents already have this prop­erty, and show that oth­ers can be eas­ily mod­ified to gain it. They also demon­strate that even an ideal agent that tends to the op­ti­mal be­havi­our in any com­putable en­vi­ron­ment can be made safely in­ter­rupt­ible.

Th­ese re­sults will have im­pli­ca­tions in fu­ture re­search di­rec­tions in AI safety. As the pa­per says, “Safe in­ter­rupt­ibil­ity can be use­ful to take con­trol of a robot that is mis­be­hav­ing… take it out of a del­i­cate situ­a­tion, or even to tem­porar­ily use it to achieve a task it did not learn to perform….” As Arm­strong ex­plains, “Ma­chine learn­ing is one of the most pow­er­ful tools for build­ing AI that has ever ex­isted. But ap­ply­ing it to ques­tions of AI mo­ti­va­tions is prob­le­matic: just as we hu­mans would not will­ingly change to an alien sys­tem of val­ues, any agent has a nat­u­ral ten­dency to avoid chang­ing its cur­rent val­ues, even if we want to change or tune them. In­ter­rupt­ibil­ity and the re­lated gen­eral idea of cor­rigi­bil­ity, al­low such changes to hap­pen with­out the agent try­ing to re­sist them or force them. The new­ness of the field of AI safety means that there is rel­a­tively lit­tle aware­ness of these prob­lems in the wider ma­chine learn­ing com­mu­nity. As with other ar­eas of AI re­search, Deep­Mind re­mains at the cut­ting edge of this im­por­tant sub­field.”

On the prospect of con­tin­u­ing col­lab­o­ra­tion in this field with Deep­Mind, Stu­art said, “I per­son­ally had a re­ally illu­mi­nat­ing time writ­ing this pa­per—Lau­rent is a brilli­ant re­searcher… I sincerely look for­ward to pro­duc­tive col­lab­o­ra­tion with him and other re­searchers at Deep­Mind into the fu­ture.” The same sen­ti­ment is echoed by Lau­rent, who said, “It was a real plea­sure to work with Stu­art on this. His cre­ativity and crit­i­cal think­ing as well as his tech­ni­cal skills were es­sen­tial com­po­nents to the suc­cess of this work. This col­lab­o­ra­tion is one of the first steps to­ward AI Safety re­search, and there’s no doubt FHI and Google Deep­Mind will work again to­gether to make AI safer.”

For more in­for­ma­tion, or to sched­ule an in­ter­view, please con­tact Kyle Scott at fhipa@philos­o­phy.ox.ac.uk