Pattern comments on Richard Ngo’s Shortform

Pattern 27 Apr 2020 3:39 UTC
2 points
We do in fact often train agents using algorithms which are proven to eventually converge to the optimal policy.[1]
At least, the tabular algorithms are proven, but no one uses those for real stuff. I’m not sure what the results are for function approximators, but I think you get my point. ↩︎
Is the point that people try to use algorithms which they think will eventually converge to the optimal policy? (Assuming there is one.)
- TurnTrout 27 Apr 2020 3:55 UTC
  2 points
  Parent
  Something like that, yeah.