We do in fact often train agents using algorithms which are proven to eventually converge to the optimal policy.[1]
At least, the tabular algorithms are proven, but no one uses those for real stuff. I’m not sure what the results are for function approximators, but I think you get my point. ↩︎
Is the point that people try to use algorithms which they think will eventually converge to the optimal policy? (Assuming there is one.)
Is the point that people try to use algorithms which they think will eventually converge to the optimal policy? (Assuming there is one.)
Something like that, yeah.