(I really only skimmed the paper, these are just impressions off the top of my head.)
I agree that “eating this sandwich” doesn’t have a reward prediction per se, because there are lots of different ways to think about eating this sandwich, especially what aspects are salient, what associations are salient, what your hormones and mood are, etc. If neuroeconomics is premised on reward predictions being attached to events and objects rather than thoughts, then I don’t like neuroeconomics either, at least not as a mechanistic theory of psychology. [I don’t know anything about neuroeconomics, maybe that was never the idea anyway.]
But when they float the idea of throwing out rewards altogether, I’m not buying it. The main reason is: I’m trying to understand what the brain does algorithmically, and I feel like I’m making progress towards a coherent picture …and part of that picture is a 1-dimensional signal called reward. If you got rid of that, I just have no idea how to fill in that gap. Doesn’t mean it’s impossible, but I did try to think it through and failed.
There’s also a nice biological story going with the algorithm story: the basal ganglia has a dense web of connections across the frontal lobe, and can just memorize “this meaningless set of neurons firing is associated with that reward, and this meaningless set of neurons firing is associated with that reward, etc. etc.” Then it (1) inhibits all but the highest-reward-predicting activity, and (2) updates the reward predictions based on what happens (TD learning). (Again this and everything else is very sketchy and speculative.)
(DeepMind had a paper that says there’s a reward prediction probability distribution instead of a reward prediction value, which is fine, that’s still consistent with the rest of my story.)
I get how deep neural nets can search for a policy directly. I don’t think those methods are consistent with the other things I believe about the brain (or at least the neocortex). In particular I think the brain does seem to have a mechanism for choosing among different possible actions being considered in parallel, as opposed to a direct learned function from sensory input to output. The paper also mentions learning to compare without learning a value, but I don’t think that works because there are too many possible comparisons (the square of the number of possible thoughts).
Quick comments on “The case against economic values in the brain” by Benjamin Hayden & Yael Niv :
(I really only skimmed the paper, these are just impressions off the top of my head.)
I agree that “eating this sandwich” doesn’t have a reward prediction per se, because there are lots of different ways to think about eating this sandwich, especially what aspects are salient, what associations are salient, what your hormones and mood are, etc. If neuroeconomics is premised on reward predictions being attached to events and objects rather than thoughts, then I don’t like neuroeconomics either, at least not as a mechanistic theory of psychology. [I don’t know anything about neuroeconomics, maybe that was never the idea anyway.]
But when they float the idea of throwing out rewards altogether, I’m not buying it. The main reason is: I’m trying to understand what the brain does algorithmically, and I feel like I’m making progress towards a coherent picture …and part of that picture is a 1-dimensional signal called reward. If you got rid of that, I just have no idea how to fill in that gap. Doesn’t mean it’s impossible, but I did try to think it through and failed.
There’s also a nice biological story going with the algorithm story: the basal ganglia has a dense web of connections across the frontal lobe, and can just memorize “this meaningless set of neurons firing is associated with that reward, and this meaningless set of neurons firing is associated with that reward, etc. etc.” Then it (1) inhibits all but the highest-reward-predicting activity, and (2) updates the reward predictions based on what happens (TD learning). (Again this and everything else is very sketchy and speculative.)
(DeepMind had a paper that says there’s a reward prediction probability distribution instead of a reward prediction value, which is fine, that’s still consistent with the rest of my story.)
I get how deep neural nets can search for a policy directly. I don’t think those methods are consistent with the other things I believe about the brain (or at least the neocortex). In particular I think the brain does seem to have a mechanism for choosing among different possible actions being considered in parallel, as opposed to a direct learned function from sensory input to output. The paper also mentions learning to compare without learning a value, but I don’t think that works because there are too many possible comparisons (the square of the number of possible thoughts).