This explanation via counting argument doesn’t seem very satisfying to me, because hypotheses expressed in natural language strike me as highly expressive and thus likely highly degenerate, like programs on a (bounded) universal Turing machine, rather than shallow, inexpressive, and forming a non-degenerate basis for function space, like polynomials.
The space of programs automatically has exponentially more points corresponding to equivalent implementations of short, simple programs. So, if you choose a random program from the set of programs that both fit in your head and can explain the data, you’re automatically implementing a Solomonoff-style simplicity prior. Meaning that gathering all data first and inventing a random explanation to fit to it afterwards ought to work perfectly fine. The explanation would automatically not be overfitted and generalise well to new data by default.
I think whatever is going on here is instead related to the dynamics of how we invent hypotheses. It doesn’t seem to just amount to picking random hypotheses until we find one that explains the data well. What my head is doing seems much more like some kind of local search. I start at some point in hypothesis space, and use heuristics, logical reasoning, and trial and error to iteratively make conceptual changes to my idea, moving step by step through some kind of hypothesis landscape where distance is determined by conceptual similarity of some sort.
Since this is a local search, rather than an unbiased global draw, it can have all kinds of weird pathologies and failure modes, depending on the idiosyncracies of the update rules it uses. I’d guess that the human tendency to overfit when presented with all the data from the start is some failure mode of this kind. No idea what specifically is going wrong there though.
Despite the simplicity prior, “this nontrivial program implements the behavior I want” almost always fails to be true because the program contains bugs. Even after you fix the bugs, it still almost always fails to be true because it contains even more bugs.
False scientific theories don’t seem to fail in quite the same ways, and I’m not sure how much of that is due to differences in the search processes versus the search spaces (e.g. physics seems quite different from Python).
This explanation via counting argument doesn’t seem very satisfying to me, because hypotheses expressed in natural language strike me as highly expressive and thus likely highly degenerate, like programs on a (bounded) universal Turing machine, rather than shallow, inexpressive, and forming a non-degenerate basis for function space, like polynomials.
The space of programs automatically has exponentially more points corresponding to equivalent implementations of short, simple programs. So, if you choose a random program from the set of programs that both fit in your head and can explain the data, you’re automatically implementing a Solomonoff-style simplicity prior. Meaning that gathering all data first and inventing a random explanation to fit to it afterwards ought to work perfectly fine. The explanation would automatically not be overfitted and generalise well to new data by default.
I think whatever is going on here is instead related to the dynamics of how we invent hypotheses. It doesn’t seem to just amount to picking random hypotheses until we find one that explains the data well. What my head is doing seems much more like some kind of local search. I start at some point in hypothesis space, and use heuristics, logical reasoning, and trial and error to iteratively make conceptual changes to my idea, moving step by step through some kind of hypothesis landscape where distance is determined by conceptual similarity of some sort.
Since this is a local search, rather than an unbiased global draw, it can have all kinds of weird pathologies and failure modes, depending on the idiosyncracies of the update rules it uses. I’d guess that the human tendency to overfit when presented with all the data from the start is some failure mode of this kind. No idea what specifically is going wrong there though.
Despite the simplicity prior, “this nontrivial program implements the behavior I want” almost always fails to be true because the program contains bugs. Even after you fix the bugs, it still almost always fails to be true because it contains even more bugs.
False scientific theories don’t seem to fail in quite the same ways, and I’m not sure how much of that is due to differences in the search processes versus the search spaces (e.g. physics seems quite different from Python).