Alex, thanks for the welcome (happy to be here!) and the summary.
I’m generally familiar with this line of thought. My main comment is that the Solomonoff perspective feels somewhat opposite to the usual NFL/“inductive bias” story (where the claim is that a good image model needs a good prior over image statistics, etc.). Yes, a simplicity bias is a kind of inductive bias, but it’s supposed to be universal (domain independent). And if generic architectures really give us this prior “for free”, as suggested by results like Dingle et al., then it seems the hard part isn’t the prior, but rather being able to sample from it conditioned on low training error (i.e., the training process).
That said, this line of reasoning—if taken literally—seems difficult to reconcile with some observed facts, e..g, things like architecture choice and data augmentation do seem to matter for generalization. To me, these suggest that you need some inductive bias beyond algorithmic simplicity alone. (Possibly another way to think about it: the smaller the dataset, the more the additive constant in Kolmogorov complexity starts to matter.)
Alex, thanks for the welcome (happy to be here!) and the summary.
I’m generally familiar with this line of thought. My main comment is that the Solomonoff perspective feels somewhat opposite to the usual NFL/“inductive bias” story (where the claim is that a good image model needs a good prior over image statistics, etc.). Yes, a simplicity bias is a kind of inductive bias, but it’s supposed to be universal (domain independent). And if generic architectures really give us this prior “for free”, as suggested by results like Dingle et al., then it seems the hard part isn’t the prior, but rather being able to sample from it conditioned on low training error (i.e., the training process).
That said, this line of reasoning—if taken literally—seems difficult to reconcile with some observed facts, e..g, things like architecture choice and data augmentation do seem to matter for generalization. To me, these suggest that you need some inductive bias beyond algorithmic simplicity alone. (Possibly another way to think about it: the smaller the dataset, the more the additive constant in Kolmogorov complexity starts to matter.)