Reinforcement Learning is easy to conceptualize. The key missing ingredient is that we explicitly specify algorithms to maximize the reward. So this is disanalogous to humans: to train your 5yo, you need only give the reward and the 5yo may adapt their behavior because they value the reward; in a reinforcement learning agent, the second step only occurs because we make it occur. You could just as well flip the algorithm to pursue minimal rewards instead.
I think my question is deeper—why do machines ‘want’ or ‘have a goal to’ follow the algorithm to maximize reward? How can machines ‘find stuff rewarding’?
As far as current systems are concerned, the answer is that (as far as anyone knows) they don’t find things rewarding or want things. But they can still run a search to optimize a training signal, and that gives you an agent.
Reinforcement Learning is easy to conceptualize. The key missing ingredient is that we explicitly specify algorithms to maximize the reward. So this is disanalogous to humans: to train your 5yo, you need only give the reward and the 5yo may adapt their behavior because they value the reward; in a reinforcement learning agent, the second step only occurs because we make it occur. You could just as well flip the algorithm to pursue minimal rewards instead.
Thanks!
I think my question is deeper—why do machines ‘want’ or ‘have a goal to’ follow the algorithm to maximize reward? How can machines ‘find stuff rewarding’?
As far as current systems are concerned, the answer is that (as far as anyone knows) they don’t find things rewarding or want things. But they can still run a search to optimize a training signal, and that gives you an agent.