Thanks for the great article. A general question—what happens if the action space in the environment is state-dependent? In this case, if I use an “Atari-like” Neural Network for approximating Q function, it will also give values for Q(a,s) for non-feasible pairs of a and s. In practice, I could just ignore these pairs, but will this create any problems theoretically speaking? If so, could you give a quick suggestion about how to fix this or where to look for solution?
Not an expert on this, but I believe the idea is that over time the agent will learn to assign negligible probabilities to actions that don’t do anything. For instance, imagine a game where the agent can move in four directions, but if there’s a wall in front of it, moving forward does nothing. The agent will eventually learn to stop moving forward in this circumstance. So you could probably just make it work, even if it’s a bit less efficient, if you just had the environment do nothing if an invalid action was selected.
Hello,
Thanks for the great article. A general question—what happens if the action space in the environment is state-dependent? In this case, if I use an “Atari-like” Neural Network for approximating Q function, it will also give values for Q(a,s) for non-feasible pairs of a and s. In practice, I could just ignore these pairs, but will this create any problems theoretically speaking? If so, could you give a quick suggestion about how to fix this or where to look for solution?
Thanks!
Hi Giorgi,
Not an expert on this, but I believe the idea is that over time the agent will learn to assign negligible probabilities to actions that don’t do anything. For instance, imagine a game where the agent can move in four directions, but if there’s a wall in front of it, moving forward does nothing. The agent will eventually learn to stop moving forward in this circumstance. So you could probably just make it work, even if it’s a bit less efficient, if you just had the environment do nothing if an invalid action was selected.