Thanks for this post! Relatedly, Simon DeDeo had a thread on different ways the KL-divergence pops up in many fields:
Kullback-Leibler divergence has an enormous number of interpretations and uses: psychological, epistemic, thermodynamic, statistical, computational, geometrical… I am pretty sure I could teach an entire graduate seminar on it.
Psychological: an excellent predictor of where attention is directed. http://ilab.usc.edu/surprise/
Epistemic: a normative measure of where you ought to direct your experimental efforts (maximize expected model-breaking) http://www.jstor.org/stable/4623265
Thermodynamic: a measure of work you can extract from an out-of-equlibrium system as it relaxes to equilibrium.
Statistical: too many to count, but (e.g.) a measure of the failure of an approximation method. https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained
Computational (machine learning): a measure of model inefficiency—the extent to which it retains useless information. https://arxiv.org/abs/1203.3271
Computational (compression): the extent to which a compression algorithm designed for one system fails when applied to another.
Geometrical: the (non-metric!) connection when one extends differential geometry to the probability simplex.
Biological: the extent to which subsystems co-compute.
Machine learning: the basic loss function for autoencoders, deep learning, etc. (people call it the “cross-entropy”)
Algorithmic fairness. How to optimally constrain a prediction algorithm when ensuring compliance with laws on equitable treatment. https://arxiv.org/abs/1412.4643
Cultural evolution: a metric (we believe) for the study of individual exploration and innovation tasks… https://www.sciencedirect.com/science/article/pii/S0010027716302840 …
Digital humanism: Kullback-Leibler divergence is related to TFIDF, but with much nicer properties when it comes to coarse-graining. (The most distinctive words have the highest partial-KL when teasing apart documents; stopwords have the lowest) http://www.mdpi.com/1099-4300/15/6/2246
Mutual information: Well, it’s a special case of Kullback-Leibler—the extent to which you’re surprised by (arbitrary) correlations between a pair of variables if you believe they’re independent.
Statistics: it’s the underlying justification for the Akiake Information Criterion, used for model selection.
Philosophy of mind: It’s the “free energy” term in the predictive brain account of perception and consciousness. See Andy Clark’s new book or https://link.springer.com/article/10.1007%2Fs11229-017-1534-5
Thanks for this comment, I found it useful.
What did you want to write at the end of the penultimate paragraph?