a metric, a feature sorely lacking from
I have a pet peeve around this, which is hopefully a useful comment for someone to read; KL-divergence should not be symmetric, because of the whole thing that it is. If you’re using KL-divergence and thinking to yourself “I wish this was symmetric”, then that should be a red flag that you’re using the wrong tool!
I think it’s easy for people to think, “hm, I’d like a way to quantify the how different two probability distributions are from each other” and then they grab the nearest hammer, which happens to be KL-divergence. But mathematical definitions are not for things, instead they mean things.
You should use KL-divergence when you want to measure the cost of modelling a true distribution using a false distribution. The asymmetry comes from the fact that one of them is the true one (and therefore the one that you take the expected value with respect to).
(I have no idea what y’all are using KL-divergence for, so I have no opinion about whether you should have been using it in this theorem.)