John_Maxwell comments on Thoughts on reward engineering

John_Maxwell 25 Jan 2019 10:22 UTC
3 points
0
Regarding long time horizons, it seems like the way humans handle this problem is to plan in high resolution over a short time horizon (the coming day or the coming week) and lower resolution over a long time horizon (the coming year or the coming decade). It seems like maybe the AI could use a similar tactic, so the 40-year planning is done with a game where each year constitutes a single time-step. I think maybe this is related to hierarchical reinforcement learning? (The option you outline seems acceptable to me though.)