Additional note: I think you can implement the approach I sketched (which I’m calling “reversible probabilistic programming”) using autoencoders. Represent each layer of the autoencoder by a single-layer neural net f : a -> Distr b and approximate inverse g : b -> Distr a. Given a distribution for the previous layer x : Distr a, get the distribution for the next layer by taking Unsamp g (InvFmap (\(x, y) -> (y, x)) (Samp f x)) :: Distr b. Compose a lot of these together to get a multi-layer generative model. This seems simple enough that there’s probably a simple more direct way to estimate the entropy of a generative model represented by an autoencoder.
f : a -> Distr b
g : b -> Distr a
x : Distr a
Unsamp g (InvFmap (\(x, y) -> (y, x)) (Samp f x)) :: Distr b
(actually, I think this might not work, because the inverses won’t be very accurate, but maybe something like this works?)