In section one, where you define the action-value function, you use R for return where I believe you intended to use G.
Current theme: default
Less Wrong (text)
Less Wrong (link)
In section one, where you define the action-value function, you use R for return where I believe you intended to use G.