Moral AI: Options

Epistemic sta­tus: One part quotes (in­for­ma­tive, ac­cu­rate), one part spec­u­la­tion (not so ac­cu­rate).

One av­enue to­wards AI safety is the con­struc­tion of “moral AI” that is good at solv­ing the prob­lem of hu­man prefer­ences and val­ues. Five FLI grants have re­cently been funded that pur­sue differ­ent lines of re­search on this prob­lem.

The pro­jects, in alpha­bet­i­cal or­der:

Most con­tem­po­rary AI sys­tems base their de­ci­sions solely on con­se­quences, whereas hu­mans also con­sider other morally rele­vant fac­tors, in­clud­ing rights (such as pri­vacy), roles (such as in fam­i­lies), past ac­tions (such as promises), mo­tives and in­ten­tions, and so on. Our goal is to build these ad­di­tional morally rele­vant fea­tures into an AI sys­tem. We will iden­tify morally rele­vant fea­tures by re­view­ing the­o­ries in moral philos­o­phy, con­duct­ing sur­veys in moral psy­chol­ogy, and us­ing ma­chine learn­ing to lo­cate fac­tors that af­fect hu­man moral judg­ments. We will use and ex­tend game the­ory and so­cial choice the­ory to de­ter­mine how to make these fea­tures more pre­cise, how to weigh con­flict­ing fea­tures against each other, and how to build these fea­tures into an AI sys­tem. We hope that even­tu­ally this work will lead to highly ad­vanced AI sys­tems that are ca­pa­ble of mak­ing moral judg­ments and act­ing on them.

Tech­niques: Top-down de­sign, game the­ory, moral philosophy

Pre­vi­ous work in eco­nomics and AI has de­vel­oped math­e­mat­i­cal mod­els of prefer­ences, along with al­gorithms for in­fer­ring prefer­ences from ob­served ac­tions. [Ci­ta­tion of in­verse re­in­force­ment learn­ing] We would like to use such al­gorithms to en­able AI sys­tems to learn hu­man prefer­ences from ob­served ac­tions. How­ever, these al­gorithms typ­i­cally as­sume that agents take ac­tions that max­i­mize ex­pected util­ity given their prefer­ences. This as­sump­tion of op­ti­mal­ity is false for hu­mans in real-world do­mains. Op­ti­mal se­quen­tial plan­ning is in­tractable in com­plex en­vi­ron­ments and hu­mans perform very rough ap­prox­i­ma­tions. Hu­mans of­ten don’t know the causal struc­ture of their en­vi­ron­ment (in con­trast to MDP mod­els). Hu­mans are also sub­ject to dy­namic in­con­sis­ten­cies, as ob­served in pro­cras­ti­na­tion, ad­dic­tion and in im­pul­sive be­hav­ior. Our pro­ject seeks to de­velop al­gorithms that learn hu­man prefer­ences from data de­spite the sub­op­ti­mal­ity of hu­mans and the be­hav­ioral bi­ases that in­fluence hu­man choice. We will test our al­gorithms on real-world data and com­pare their in­fer­ences to peo­ple’s own judg­ments about their prefer­ences. We will also in­ves­ti­gate the the­o­ret­i­cal ques­tion of whether this ap­proach could en­able an AI to learn the en­tirety of hu­man val­ues.

Tech­niques: Try­ing to find some­thing bet­ter than in­verse re­in­force­ment learn­ing, su­per­vised learn­ing from prefer­ence judgments

The fu­ture will see au­tonomous agents act­ing in the same en­vi­ron­ment as hu­mans, in ar­eas as di­verse as driv­ing, as­sis­tive tech­nol­ogy, and health care. In this sce­nario, col­lec­tive de­ci­sion mak­ing will be the norm. We will study the em­bed­ding of safety con­straints, moral val­ues, and eth­i­cal prin­ci­ples in agents, within the con­text of hy­brid hu­man/​agents col­lec­tive de­ci­sion mak­ing. We will do that by adapt­ing cur­rent logic-based mod­el­ling and rea­son­ing frame­works, such as soft con­straints, CP-nets, and con­straint-based schedul­ing un­der un­cer­tainty. For eth­i­cal prin­ci­ples, we will use con­straints spec­i­fy­ing the ba­sic eth­i­cal ``laws″, plus so­phis­ti­cated pri­ori­tised and pos­si­bly con­text-de­pen­dent con­straints over pos­si­ble ac­tions, equipped with a con­flict re­s­olu­tion en­g­ine. To avoid reck­less be­hav­ior in the face of un­cer­tainty, we will bound the risk of vi­o­lat­ing these eth­i­cal laws. We will also re­place prefer­ence ag­gre­ga­tion with an ap­pro­pri­ately de­vel­oped con­straint/​value/​ethics/​prefer­ence fu­sion, an op­er­a­tion de­signed to en­sure that agents’ prefer­ences are con­sis­tent with the sys­tem’s safety con­straints, the agents’ moral val­ues, and the eth­i­cal prin­ci­ples of both in­di­vi­d­ual agents and the col­lec­tive de­ci­sion mak­ing sys­tem. We will also de­velop ap­proaches to learn eth­i­cal prin­ci­ples for ar­tifi­cial in­tel­li­gent agents, as well as pre­dict pos­si­ble eth­i­cal vi­o­la­tions.

Tech­niques: Top-down de­sign, obey­ing eth­i­cal prin­ci­ples/​laws, learn­ing eth­i­cal principles

The ob­jec­tives of the pro­posed re­search are (1) to cre­ate a math­e­mat­i­cal frame­work in which fun­da­men­tal ques­tions of value al­ign­ment can be in­ves­ti­gated; (2) to de­velop and ex­per­i­ment with meth­ods for al­ign­ing the val­ues of a ma­chine (whether ex­plic­itly or im­plic­itly rep­re­sented) with those of hu­mans; (3) to un­der­stand the re­la­tion­ships among the de­gree of value al­ign­ment, the de­ci­sion-mak­ing ca­pa­bil­ity of the ma­chine, and the po­ten­tial loss to the hu­man; and (4) to un­der­stand in par­tic­u­lar the im­pli­ca­tions of the com­pu­ta­tional limi­ta­tions of hu­mans and ma­chines for value al­ign­ment. The core of our tech­ni­cal ap­proach will be a co­op­er­a­tive, game-the­o­retic ex­ten­sion of in­verse re­in­force­ment learn­ing, al­low­ing for the differ­ent ac­tion spaces of hu­mans and ma­chines and the vary­ing mo­ti­va­tions of hu­mans; the con­cepts of ra­tio­nal metar­ea­son­ing and bounded op­ti­mal­ity will in­form our in­ves­ti­ga­tion of the effects of com­pu­ta­tional limi­ta­tions.

Tech­niques: Try­ing to find some­thing bet­ter than in­verse re­in­force­ment learn­ing (differ­ently this time), cre­at­ing a math­e­mat­i­cal frame­work, what­ever ra­tio­nal metar­ea­son­ing is

Au­tonomous AI sys­tems will need to un­der­stand hu­man val­ues in or­der to re­spect them. This re­quires hav­ing similar con­cepts as hu­mans do. We will re­search whether AI sys­tems can be made to learn their con­cepts in the same way as hu­mans learn theirs. Both hu­man con­cepts and the rep­re­sen­ta­tions of deep learn­ing mod­els seem to in­volve a hi­er­ar­chi­cal struc­ture, among other similar­i­ties. For this rea­son, we will at­tempt to ap­ply ex­ist­ing deep learn­ing method­olo­gies for learn­ing what we call moral con­cepts, con­cepts through which moral val­ues are defined. In ad­di­tion, we will in­ves­ti­gate the ex­tent to which re­in­force­ment learn­ing af­fects the de­vel­op­ment of our con­cepts and val­ues.

Tech­niques: Try­ing to iden­tify learned moral con­cepts, un­su­per­vised learn­ing

The elephant in the room is that mak­ing judg­ments that always re­spect hu­man prefer­ences is nearly FAI-com­plete. Ap­pli­ca­tion of hu­man ethics is de­pen­dent on hu­man prefer­ences in gen­eral, which are de­pen­dent on a model of the world and how ac­tions im­pact it. Cal­ling an ac­tion eth­i­cal also can also de­pend on the space of pos­si­ble ac­tions, re­quiring a good judg­ment-maker to be ca­pa­ble of search for good ac­tions. Any “moral AI” we build with our cur­rent un­der­stand­ing is go­ing to have to be limited and/​or un­satis­fac­tory.

Limi­ta­tions might be things like judg­ing which of two ac­tions is “more cor­rect” rather than find­ing cor­rect ac­tions, only tak­ing in­put in terms of one para­graph-worth of words, or only pro­duc­ing good out­puts for situ­a­tions similar to some com­bi­na­tion of trained situ­a­tions.

Two of the pro­pos­als are cen­tered on top-down con­struc­tion of a sys­tem for mak­ing eth­i­cal judg­ments. De­sign­ing a sys­tem by hand, it’s nigh-im­pos­si­ble to cap­ture the sub­tleties of hu­man val­ues. Re­lat­edly, it seems weak at gen­er­al­iza­tion to novel situ­a­tions, un­less the spe­cific sort of gen­er­al­iza­tion has been forseen and cov­ered. The good points of a top down ap­proach are that it can cap­ture things that are im­por­tant, but are only a small part of the de­scrip­tion, or are not eas­ily iden­ti­fied by statis­ti­cal prop­er­ties. A top-down model of ethics might be used as a fail-safe, some­times notic­ing when some­thing un­de­sir­able is hap­pen­ing, or as a start­ing point for a richer learned model of hu­man prefer­ences.

Other pro­pos­als are in­spired by in­verse re­in­force­ment learn­ing. In­verse re­in­force­ment learn­ing seems like the sort of thing we want—it ob­serves ac­tions and in­fers prefer­ences—but it’s very limited. The prob­lem of hav­ing to know a very good model of the world in or­der to be good at hu­man prefer­ences rears its head here. There are also likely un­forseen tech­ni­cal prob­lems in en­sur­ing that the thing it learns is ac­tu­ally hu­man prefer­ences (rather than hu­man foibles, or ir­rele­vant pat­terns) - though this is, in part, why this re­search should be car­ried out now.

Some pro­pos­als want to take ad­van­tage of learn­ing us­ing neu­ral net­works, trained on peo­ples’ ac­tions or judg­ments. This sort of ap­proach is very good at dis­cov­er­ing pat­terns, but not so good at treat­ing pat­terns as a con­se­quence of un­der­ly­ing struc­ture. Such a learner might be use­ful as a heuris­tic, or as a way to fill in a more com­pli­cated, spe­cial­ized ar­chi­tec­ture. For this ap­proach like the oth­ers, it seems im­por­tant to make the most progress to­ward learn­ing hu­man val­ues in a way that doesn’t re­quire a very good model of the world.