Can I ask you to unwind the fundamentals a step further, and say why you and neuroscientists in general believe the brain operates by RL and has a reward function? And how far down the scale of life these have been found?
If you use a narrow definition like “RL is exactly those algorithms that are on arxiv cs.AI right now with an RL label”, then the brain is not RL.
If you use a broad definition like “RL is anything with properties like Thorndike’s law of effect”, then, well, remember that “reinforcement learning” was a psychology term long before it was an AI term!
If it helps, I was arguing about this with a neuroscientist friend (Eli Sennesh) earlier this year, and wrote the following summary (not necessarily endorsed by Eli) afterwards in my notes:
Eli doesn’t like the term “RL” in a brain context because of (1) its implication that “reward” is stuff in the environment as opposed to an internal “reward function” built from brain-internal signals, (2) its implication that we’re specifically maximizing an exponentially-discounted sum of future rewards.
…Whereas I like the term “RL” because (1) If brain-like algorithms showed up on GitHub, then everyone in AI would call it an “RL algorithm”, put it in “RL textbooks”, and use it to solve “RL problems”, (2) This follows the historical usage (there’s reinforcement, and there’s learning, per Thorndike’s Law of Effect etc.).
When I want to talk about “the brain’s model-based RL system”, I should translate that to “the brain’s Bellman-solving system” when I’m talking to Eli, and then we’ll be more-or-less on the same page I think?
…But Eli is just one guy, I think there are probably dozens of other schools-of-thought with their own sets of complaints or takes on “RL”.
how far down the scale of life these have been found?
I don’t view this as particularly relevant to understanding human brains, intelligence, or AGI, but since you asked, if we define RL in the broad (psych-literature) sense, then here’s a relevant book excerpt:
Pavlovian conditioning occurs in a naturally brainless species, sea anemones, but it is also possible to study protostomes that have had their brains removed. An experiment by Horridge[130] demonstrated response–outcome conditioning in decapitated cockroaches and locusts. Subsequent studies showed that either the ventral nerve cord[131,132] or an isolated peripheral ganglion[133] suffices to acquire and retain these memories.
In a representative experiment, fine wires were inserted into two legs from different animals. One of the legs touched a saline solution when it was sufficiently extended, a response that completed an electrical circuit and produced the unconditioned stimulus: shock. A yoked leg received shock simultaneously. The two legs differed in that the yoked leg had a random joint angle at the time of the shock, whereas the master leg always had a joint angle large enough for its “foot” to touch the saline. Flexion of the leg reduced the joint’s angle and terminated the shock. After one leg had been conditioned, both legs were then tested independently. The master leg flexed sufficiently to avoid shock significantly more frequently than the yoked leg did, demonstrating a response–outcome (R–O) memory. —Evolution of Memory Systems
Can I ask you to unwind the fundamentals a step further, and say why you and neuroscientists in general believe the brain operates by RL and has a reward function? And how far down the scale of life these have been found?
Oh, it’s definitely controversial—as I always say, there is never a neuroscience consensus. My sense is that a lot of the controversy is about how broadly to define “reinforcement learning”.
If you use a narrow definition like “RL is exactly those algorithms that are on arxiv cs.AI right now with an RL label”, then the brain is not RL.
If you use a broad definition like “RL is anything with properties like Thorndike’s law of effect”, then, well, remember that “reinforcement learning” was a psychology term long before it was an AI term!
If it helps, I was arguing about this with a neuroscientist friend (Eli Sennesh) earlier this year, and wrote the following summary (not necessarily endorsed by Eli) afterwards in my notes:
…But Eli is just one guy, I think there are probably dozens of other schools-of-thought with their own sets of complaints or takes on “RL”.
I don’t view this as particularly relevant to understanding human brains, intelligence, or AGI, but since you asked, if we define RL in the broad (psych-literature) sense, then here’s a relevant book excerpt: