Charlie Steiner comments on Hedonic Loops and Taming RL

Charlie Steiner 19 Jul 2023 23:09 UTC
6 points
2
I don’t think linearity of policies is the answer, at all. If I drive up to an intersection, I might want to have a policy of muscle contractions that causes me to turn left or turn right, but I don’t want to turn an entropy-weighted angle between 90 and −90 degrees.
Approximate linearity might work better—each drive outputting preferences for different high-level policies, and policies being “added” by randomly selecting just one to implement. This doesn’t work very well for feed-forward networks, but it works for recurrent networks—you can fall into an attractor state where you competently turn the car left over many time steps, even though you also had a chance of falling into the attractor state of turning right competently.
- Seth Herd 20 Jul 2023 15:16 UTC
  4 points
  0
  Parent
  Excellent point. The basal ganglia is thought to address this problem by “gating” one motor plan while suppressing the others that narrowly lost the competition for selection. It probably performs a similar function in abstract decision-making. See my paper Neural mechanisms of human decision-making (linked in another comment on this post) for more on this.