Regarding long time horizons, it seems like the way humans handle this problem is to plan in high resolution over a short time horizon (the coming day or the coming week) and lower resolution over a long time horizon (the coming year or the coming decade). It seems like maybe the AI could use a similar tactic, so the 40-year planning is done with a game where each year constitutes a single time-step. I think maybe this is related to hierarchical reinforcement learning? (The option you outline seems acceptable to me though.)
Regarding long time horizons, it seems like the way humans handle this problem is to plan in high resolution over a short time horizon (the coming day or the coming week) and lower resolution over a long time horizon (the coming year or the coming decade). It seems like maybe the AI could use a similar tactic, so the 40-year planning is done with a game where each year constitutes a single time-step. I think maybe this is related to hierarchical reinforcement learning? (The option you outline seems acceptable to me though.)