Viliam comments on How is reinforcement learning possible in non-sentient agents?

Viliam 12 Jan 2021 22:54 UTC
3 points
Seems to me that there must be more about pain and pleasure than mere −1 and +1 signals, because there are multiple methods how to make some behavior more or less likely. Pain and pleasure is one such option, habits are another option, unconscious biases yet another. Each of them make some behavior more likely and some other behavior less likely, but feel quite differently from inside. Compared to habits and unconscious biases, pain and pleasure have some extra quality because of how they are implemented in our bodies.
The simple RL agents, unless they have the specific circuits to feel pain and pleasure, are in my opinion more analogical to the habits or unconscious biases.
- Brian_Tomasik 3 Nov 2021 11:53 UTC
  1 point
  Parent
  Thanks. :) What do you mean by “unconscious biases”? Do you mean unconscious RL, like how the muscles in our legs might learn to walk without us being aware of the feedback they’re getting? (Note: I’m not an expert on how our leg muscles actually learn to walk, but maybe it’s RL of some sort.) I would agree that simple RL agents are more similar to that. I think these systems can still be considered marginally conscious to themselves, even if the parts of us that talk have no introspective access to them, but they’re much less morally significant than the parts of us that can talk.
  
  Perhaps pain and pleasure are what we feel when getting punishment and reward signals that are particularly important for our high-level brains to pay attention to.