adamShimi comments on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian

adamShimi 31 Dec 2020 10:35 UTC
LW: 5 AF: 3
0
AF
Okay, I think a confusion here is that I (and the OP AFAIK) don’t talk about the same things as you do when using the word “function”. Solomonoff induction is about programs (usually Turing machines) and from your comment it seems like the sense of functions you’re taking. But functions as I’m using here (and I’m pretty sure this is the meaning in the quote) is just an input/output relation. So it doesn’t make sense to say that two functions are equivalent (in the way your talking about at least), because they necessarily differ on some input (or they would be the same function). On the other hand, two programs can be equivalent in that they output the same things (they compute similar functions).
So from the input/output standpoint, if functions are coded in a line, the correct function is a single point. In that sense there are indeed far more functions that generalize poorly than ones that generalize well (this becomes a bit more complicated when you consider how well your function generalize, but you still generally have way more possibility to do wrong for each data point than to have the unique right answer).
I think this makes sense for clarifying the quote, as this part of the post explains the relatively classic arguments that neural networks generalize pretty well, are not really limited in terms of input/output relations, yet most input/output relations will not generalize correctly to new data. So there must be something more here.
Does that make sense to you?
- Daniel Kokotajlo 31 Dec 2020 10:51 UTC
  LW: 4 AF: 2
  0
  AF Parent
  Ahhhh, yes, thank you that hits the nail on the head!
  So I guess my original question has been answered, and I’m left with a new question about whether the analogy to solomonoff induction might be useful here: Simpler functions get more weight in the universal prior because there are more programs that compute them; perhaps simpler functions get more weight in neural network’s implicit prior because there are more parameter-settings that compute them (i.e. bigger region of parameter-space) and maybe both of these facts are true for the same reason.