Which formal properties of the KL-divergence do the proofs of your result use? It could be useful to make them all explicit to help generalize to other divergences or metrics between probability distributions.
The appendices make heavy use of additivity across independent variables (and across factorizations more generally), which is the main thing I’d expect to need to work around in order to use other divergences/metrics.
Which formal properties of the KL-divergence do the proofs of your result use? It could be useful to make them all explicit to help generalize to other divergences or metrics between probability distributions.
The appendices make heavy use of additivity across independent variables (and across factorizations more generally), which is the main thing I’d expect to need to work around in order to use other divergences/metrics.