If EPIC(R1, R2) is thought of as two functions f(g(R1), g(R2)), where g returns the optimal policy of its input, and f is a distance function for optimal policies, then f(OptimalPolicy1, OptimalPolicy2) is a metric?

The authors don’t prove it, but I believe yes, as long as DS and DA put support over the entire state space / action space (maybe you also need DT to put support over every possible transition).

I usually think of this as “EPIC is a metric if defined over the space of equivalence classes of reward functions”.

Can more than one DT be used, so there’s more than one measure?

Yes.

There’s a maximum?

For finite, discrete state/action spaces, the uniform distribution over (s, a, s’) tuples has maximal entropy. However, it’s not clear that that’s the worst case for EPIC.

The authors don’t prove it, but I believe yes, as long as DS and DA put support over the entire state space / action space (maybe you also need DT to put support over every possible transition).

I usually think of this as “EPIC is a metric if defined over the space of equivalence classes of reward functions”.

Yes.

For finite, discrete state/action spaces, the uniform distribution over (s, a, s’) tuples has maximal entropy. However, it’s not clear that that’s the worst case for EPIC.