jacob_cannell comments on Newcomb, Bostrom, Calvin: Credence and the strange path to a finite afterlife

jacob_cannell 9 Nov 2015 20:11 UTC
0 points
0

At least neural networks are a coherent class of algorithms, with lots of architectural variations and hyperparameters to tune, but still functionally similar. General Bayesian inference, on the other hand, is a broad framework with dozens types of algorithms for different tasks, based on different assumptions and with different functional structure.

I don’t agree with this memetic taxonomy. I consider neural networks to be mostly synonymous with algebraic tensor networks—general computational graphs over tensors. As such ANN describes a modeling language family, equivalent in expressibility to binary circuit models (and thus Turing universal) but considerably more computationally efficient. The tensor algebra abstraction more closely matches physical hardware reality.

So as a general computing paradagim or circuit model, ANNs can be combined with any approximate inference technique. Backprop on log-likelihood is just one obvious approx method.

You could as well say that once we formulated the theory of universal computation and we had the first digital computers up and running, then we had all the math figured out

Not quite, because it took longer for the math for inference/learning to be worked out, and even somewhat longer for efficient approximations—and indeed that work is still ongoing.

Regardless, even if all the math was available in 1956 it wouldn’t of mattered, as they still would have had to wait 60 years or so for efficient implementations (hardware + software).

The paper I linked, IMHO, may shred some light on why this happened: one of the most popular evaluation measure and training objective, the negative log-likelihood (aka empirical cross-entropy), which captures well our intuition of what a good model must do in binary (or low-dimensional) classification tasks, may break down in the high-dimensional regime, typical of some unsupervised tasks such as sampling.

To the extant that this is a problem in practice, it’s a problem with typical sampling, not the measure itself. As I mentioned earlier, I believe it can be solved by more advanced sampling techniques that respect total KC/Solomonoff probability. Using these hypothetical correct samplers, good models should always produce good samples.

That being said I agree that generative modelling and realistic sampling in particular is an area ripe for innovation.

I’ve never seen a modern generative model generate realistic samples of natural images or speech.

You actually probably have seen this in the form of CG in realistic video games or films. Of course those models are hand crafted rather than learned probabilistic generative models. I believe that cross-fertilization of ideas/techniques from graphics and ML will transform both in the near future.

The current image generative models in ML are extremely weak when viewed as procedural graphics engines—for the most part they are just 2D image blenders.