I’m not sure IRL actually ignores this, although in such a case the value learning agent may never converge on a consistent policy.
I’m not sure IRL actually ignores this, although in such a case the value learning agent may never converge on a consistent policy.