Jay Bailey comments on Deep Q-Networks Explained

Jay Bailey 11 Sep 2024 18:24 UTC
1 point
0
Hi Giorgi,

Not an expert on this, but I believe the idea is that over time the agent will learn to assign negligible probabilities to actions that don’t do anything. For instance, imagine a game where the agent can move in four directions, but if there’s a wall in front of it, moving forward does nothing. The agent will eventually learn to stop moving forward in this circumstance. So you could probably just make it work, even if it’s a bit less efficient, if you just had the environment do nothing if an invalid action was selected.