Agents That Learn From Hu­man Be­ha­vior Can’t Learn Hu­man Val­ues That Hu­mans Haven’t Learned Yet

[Epistemic status: ¯\_(ツ)_/​¯ ]

Arm­strong and Min­der­mann write about a no free lunch the­orem for in­verse re­in­force­ment learn­ing (IRL): the same ac­tion can re­flect many dif­fer­ent com­bin­a­tions of val­ues and (ir­ra­tional) plan­ning al­gorithms.

I think even as­sum­ing hu­mans were fully ra­tional ex­pec­ted util­ity max­im­izers, there would be an im­port­ant un­der­de­termin­a­tion prob­lem with IRL and with all other ap­proaches that in­fer hu­man pref­er­ences from their ac­tual be­ha­vior. This is prob­ably ob­vi­ous if and only if it’s cor­rect, and I don’t know if any non-straw people dis­agree, but I’ll ex­pand on it any­way.

Con­sider two ra­tional ex­pec­ted util­ity max­im­iz­ing hu­mans, Alice and Bob.

Alice is, her­self, a value learner. She wants to max­im­ize her true util­ity func­tion, but she doesn’t know what it is, so in prac­tice she uses a prob­ab­il­ity dis­tri­bu­tion over sev­eral pos­sible util­ity func­tions to de­cide how to act.

If Alice re­ceived fur­ther in­form­a­tion (from a moral philo­sopher, maybe), she’d start max­im­iz­ing a spe­cific one of those util­ity func­tions in­stead. But we’ll as­sume that her in­form­a­tion stays the same while her util­ity func­tion is be­ing in­ferred, and she’s not do­ing any­thing to get more; per­haps she’s not in a po­s­i­tion to.

Bob, on the other hand, isn’t a value learner. He knows what his util­ity func­tion is: it’s a weighted sum of the same sev­eral util­ity func­tions. The re­l­at­ive weights in this mix hap­pen to be identical to Alice’s re­l­at­ive prob­ab­il­it­ies.

Alice and Bob will act the same. They’ll max­im­ize the same lin­ear com­bin­a­tion of util­ity func­tions, for dif­fer­ent reas­ons. But if you could find out more than Alice knows about her true util­ity func­tion, then you’d act dif­fer­ently if you wanted to truly help Alice than if you wanted to truly help Bob.

So in some cases, it’s not enough to look at how hu­mans be­have. Hu­mans are Alice on some points and Bob on some points. Fig­ur­ing out de­tails will re­quire ex­pli­citly ad­dress­ing hu­man moral un­cer­tainty.