The KL-divergence seems pretty principled here—the idea behind natural abstractions is that there are convergent abstractions that minds will hit upon, as backed up by some of their theorems. They deal with diagrams that are approximately satisfied, where you’ve learned the diagram (e.g. defined some latent variable that you put into your models). You would expect minds to prefer modelling the environment in ways that have low KL-divergence, because that’s the thing that tells you how much extra cost you are paying for using the wrong distribution to compress and predict.
The KL-divergence seems pretty principled here—the idea behind natural abstractions is that there are convergent abstractions that minds will hit upon, as backed up by some of their theorems. They deal with diagrams that are approximately satisfied, where you’ve learned the diagram (e.g. defined some latent variable that you put into your models). You would expect minds to prefer modelling the environment in ways that have low KL-divergence, because that’s the thing that tells you how much extra cost you are paying for using the wrong distribution to compress and predict.