The KL-divergence seems pretty principled here—the idea behind natural abstractions is that there are convergent abstractions that minds will hit upon, as backed up by some of their theorems. They deal with diagrams that are approximately satisfied, where you’ve learned the diagram (e.g. defined some latent variable that you put into your models). You would expect minds to prefer modelling the environment in ways that have low KL-divergence, because that’s the thing that tells you how much extra cost you are paying for using the wrong distribution to compress and predict.
(I have no idea what y’all are using KL-divergence for, so I have no opinion about whether you should have been using it in this theorem.)
The KL-divergence seems pretty principled here—the idea behind natural abstractions is that there are convergent abstractions that minds will hit upon, as backed up by some of their theorems. They deal with diagrams that are approximately satisfied, where you’ve learned the diagram (e.g. defined some latent variable that you put into your models). You would expect minds to prefer modelling the environment in ways that have low KL-divergence, because that’s the thing that tells you how much extra cost you are paying for using the wrong distribution to compress and predict.