If EPIC(R1, R2) is thought of as two functions f(g(R1), g(R2)), where g returns the optimal policy of its input, and f is a distance function for optimal policies, then f(OptimalPolicy1, OptimalPolicy2) is a metric?
The authors don’t prove it, but I believe yes, as long as DS and DA put support over the entire state space / action space (maybe you also need DT to put support over every possible transition).
I usually think of this as “EPIC is a metric if defined over the space of equivalence classes of reward functions”.
Can more than one DT be used, so there’s more than one measure?
There’s a maximum?
For finite, discrete state/action spaces, the uniform distribution over (s, a, s’) tuples has maximal entropy. However, it’s not clear that that’s the worst case for EPIC.