(2) neural networks have the same specific inductive bias
Indeed no free lunch arguments seem to require any good learner to have good inductive bias. In a sense learning is ‘mostly’ about having the right inductive bias.
We call this specific inductive bias a simplicity bias. Informally it agrees with our intuitive notion of low complexity.
Rk. Conceptually it is a little tricky since simplicity is in the eye of the beholder—by changing the background language we can make anything with high algorithmic complexity have low complexity. People have been working on this problem for a while but at the moment it seems radically tricky.
IIRC Aram Ebtekar has a proposed solution that John Wentworth likes; I haven’t understood it myself yet. I think what one wants to say is that the [algorithmic] mutual information between the observer and the observed is low, where the observer implicitly encodes the universal turing machine used. In other words—the world is such that observers within it observe it to have low complexity with regard to their implicit reference machine.
Regardless, the fact that the real world satisfies a simplicity bias is to my mind difficult to explain without anthropics. I am afraid we may end up having to resort to an appeal to some form of UDASSA but others may have other theological commitments.
That’s the bird-eye view of simplicity bias. If you ignore the above issue and accept some sort of formally-tricky-to-define but informally “reasonable” simplicty then the question becomes: why do neural networks have a bias towards simplicity. Well they have a bias towards degeneracy—and simplicity and bias are intimiately connected, see eg:
“Some form of UDASSA” seems to be right. Why not simply take “difficult to explain otherwise” as evidence (defeasible, of course, like with evidence of physical theories)?
Alex, thanks for the welcome (happy to be here!) and the summary.
I’m generally familiar with this line of thought. My main comment is that the Solomonoff perspective feels somewhat opposite to the usual NFL/“inductive bias” story (where the claim is that a good image model needs a good prior over image statistics, etc.). Yes, a simplicity bias is a kind of inductive bias, but it’s supposed to be universal (domain independent). And if generic architectures really give us this prior “for free”, as suggested by results like Dingle et al., then it seems the hard part isn’t the prior, but rather being able to sample from it conditioned on low training error (i.e., the training process).
That said, this line of reasoning—if taken literally—seems difficult to reconcile with some observed facts, e..g, things like architecture choice and data augmentation do seem to matter for generalization. To me, these suggest that you need some inductive bias beyond algorithmic simplicity alone. (Possibly another way to think about it: the smaller the dataset, the more the additive constant in Kolmogorov complexity starts to matter.)
Hi Artemy. Welcome to LessWrong!
Agree completely with what Zach is saying here.
We need two facts
(1) the world has a specific inductive bias
(2) neural networks have the same specific inductive bias
Indeed no free lunch arguments seem to require any good learner to have good inductive bias. In a sense learning is ‘mostly’ about having the right inductive bias.
We call this specific inductive bias a simplicity bias. Informally it agrees with our intuitive notion of low complexity.
Rk. Conceptually it is a little tricky since simplicity is in the eye of the beholder—by changing the background language we can make anything with high algorithmic complexity have low complexity. People have been working on this problem for a while but at the moment it seems radically tricky.
IIRC Aram Ebtekar has a proposed solution that John Wentworth likes; I haven’t understood it myself yet. I think what one wants to say is that the [algorithmic] mutual information between the observer and the observed is low, where the observer implicitly encodes the universal turing machine used. In other words—the world is such that observers within it observe it to have low complexity with regard to their implicit reference machine.
Regardless, the fact that the real world satisfies a simplicity bias is to my mind difficult to explain without anthropics. I am afraid we may end up having to resort to an appeal to some form of UDASSA but others may have other theological commitments.
That’s the bird-eye view of simplicity bias. If you ignore the above issue and accept some sort of formally-tricky-to-define but informally “reasonable” simplicty then the question becomes: why do neural networks have a bias towards simplicity. Well they have a bias towards degeneracy—and simplicity and bias are intimiately connected, see eg:
https://www.lesswrong.com/posts/tDkYdyJSqe3DddtK4/alexander-gietelink-oldenziel-s-shortform?commentId=zH42TS7KDZo9JimTF
“Some form of UDASSA” seems to be right. Why not simply take “difficult to explain otherwise” as evidence (defeasible, of course, like with evidence of physical theories)?
Alex, thanks for the welcome (happy to be here!) and the summary.
I’m generally familiar with this line of thought. My main comment is that the Solomonoff perspective feels somewhat opposite to the usual NFL/“inductive bias” story (where the claim is that a good image model needs a good prior over image statistics, etc.). Yes, a simplicity bias is a kind of inductive bias, but it’s supposed to be universal (domain independent). And if generic architectures really give us this prior “for free”, as suggested by results like Dingle et al., then it seems the hard part isn’t the prior, but rather being able to sample from it conditioned on low training error (i.e., the training process).
That said, this line of reasoning—if taken literally—seems difficult to reconcile with some observed facts, e..g, things like architecture choice and data augmentation do seem to matter for generalization. To me, these suggest that you need some inductive bias beyond algorithmic simplicity alone. (Possibly another way to think about it: the smaller the dataset, the more the additive constant in Kolmogorov complexity starts to matter.)