abramdemski comments on The Credit Assignment Problem

abramdemski 13 Nov 2019 21:33 UTC
LW: 5 AF: 3
0
AF
Yeah, it’s definitely related. The main thing I want to point out is that Shapley values similarly require a model in order to calculate. So you have to distinguish between the problem of calculating a detailed distribution of credit and being able to assign credit “at all”—in artificial neural networks, backprop is how you assign detailed credit, but a loss function is how you get a notion of credit at all. Hence, the question “where do gradients come from?”—a reward function is like a pile of money made from a joint venture; but to apply backprop or Shapley value, you also need a model of counterfactual payoffs under a variety of circumstances. This is a problem, if you don’t have a seperate “epistemic” learning process to provide that model—ie, it’s a problem if you are trying to create one big learning algorithm that does everything.
Specifically, you don’t automatically know how to
send rewards to each contributor proportional to how much they improved the actual group decision
because in the cases I’m interested in, ie online learning, you don’t have the option of
rerunning it without them and seeing how performance declines
-- because you need a model in order to rerun.
But, also, I think there are further distinctions to make. I believe that if you tried to apply Shapley value to neural networks, it would go poorly; and presumably there should be a “philosophical” reason why this is the case (why Shapley value is solving a different problem than backprop). I don’t know exactly what the relevant distinction is.
(Or maybe Shapley value works fine for NN learning; but, I’d be surprised.)