Obviously SLT comes to mind, and some people have tried to claim that SLT suggests that neural network training is actually more like Solomonoff prior than the speed prior (e.g. bushnaq) although I think that work is pretty shaky and may well not hold up.
That post is superseded by this one. It was just a sketch I wrote up mostly to clarify my own thinking, the newer post is the finished product.
It doesn’t exactly say that neural networks have Solomonoff-style priors. It depends on the NN architecture. E.g., if your architecture is polynomials, or MLPs that only get one forward pass, I do not expect them to have a prior anything like that of a compute-bounded Universal Turing Machine.
And NN training adds in additional complications. All the results I talk about are for Bayesian learning, not things like gradient descent. I agree that this changes the picture and questions about the learnability of solutions become important. You no longer just care how much volume the solution takes up in the prior, you care how much volume each incremental building block of the solution takes up within the practically accessible search space of the update algorithm at that point in training.
That post is superseded by this one. It was just a sketch I wrote up mostly to clarify my own thinking, the newer post is the finished product.
It doesn’t exactly say that neural networks have Solomonoff-style priors. It depends on the NN architecture. E.g., if your architecture is polynomials, or MLPs that only get one forward pass, I do not expect them to have a prior anything like that of a compute-bounded Universal Turing Machine.
And NN training adds in additional complications. All the results I talk about are for Bayesian learning, not things like gradient descent. I agree that this changes the picture and questions about the learnability of solutions become important. You no longer just care how much volume the solution takes up in the prior, you care how much volume each incremental building block of the solution takes up within the practically accessible search space of the update algorithm at that point in training.