Using a high-powered black-box technique to regress a one-dimensional continuous outcome against a one-dimensional continuous predictor seems misguided.
If you want to characterize how well your evolutionary learning idea works, try it on data that you’ve generated, where you know the “underlying math”. See if you can recover the program that generated the data or one that’s equivalent to it. Or try it on really big, messy data where no one knows the right answer and see if you/it can do better than the obvious competitors like SVM, k-NN, CART, etc.
The middle ground of working on an easy/messy problem, where any sane method will give you and adequate answer but there’s no known ground truth, is not going to make a very compelling story.
Using a high-powered black-box technique to regress a one-dimensional continuous outcome against a one-dimensional continuous predictor seems misguided.
I don’t get this. You could have a rather complicated generator for this data set. A simple regression would imply the data points were independent, but the value at time T may have [likely has] a relation to value at T-3. So it seems a good problem to me.
Yes, I think that was better, because the ground truth is Kepler’s third law and jimrandomh pointed out your method actually recaptures a (badly obfuscated and possibly overfit) variant of it.
Imagine that you have an high-dimensional predictor, of which one dimension completely determines the outcome and the rest are noise. Your shortest possible generating algorithm is going to have to pick out the relevant dimension. So as the dimensionality of the predictor increases, the algorithm length will necessarily increase, just for information-theoretic reasons.
recaptures a (badly obfuscated and possibly overfit) variant of it.
How do you overfit Kepler’s law?
edit: Retracted. I see now looking at the actual link the result wasn’t just obfuscated but wrong, and so the manner in which it’s wrong can overfit of course (and that matches the results).
To the extent that Kepler’s laws are exact only for two-body systems of point masses (so I guess calling Kepler’s third law the ground truth is a bit problematic) and to the extent that the data are imperfectly observed, there are residuals that over-eager models can try to match.
Edit: More generally, you don’t overfit the underlying law, you overfit noisy data generated by the underlying law.
Dimensions irrelevant for the output, will fall out. Regardless if they are random or not. If they somehow (anyhow) contribute, their influence will remain in the evolved algorithm.
The simplest algorithm in the Kolmogorov’s sense is the best you can hope for.
Using a high-powered black-box technique to regress a one-dimensional continuous outcome against a one-dimensional continuous predictor seems misguided.
If you want to characterize how well your evolutionary learning idea works, try it on data that you’ve generated, where you know the “underlying math”. See if you can recover the program that generated the data or one that’s equivalent to it. Or try it on really big, messy data where no one knows the right answer and see if you/it can do better than the obvious competitors like SVM, k-NN, CART, etc.
The middle ground of working on an easy/messy problem, where any sane method will give you and adequate answer but there’s no known ground truth, is not going to make a very compelling story.
I don’t get this. You could have a rather complicated generator for this data set. A simple regression would imply the data points were independent, but the value at time T may have [likely has] a relation to value at T-3. So it seems a good problem to me.
http://lesswrong.com/lw/9pl/automatic_programming_an_example/
Was this better?
I always want the shortest possible generating algorithm. Everything else, any “dimensionality” is just irrelevant.
Yes, I think that was better, because the ground truth is Kepler’s third law and jimrandomh pointed out your method actually recaptures a (badly obfuscated and possibly overfit) variant of it.
“Dimensionality” is totally relevant in any approach to supervised learning. But it matters even without considering the bias/variance trade-off, etc.
Imagine that you have an high-dimensional predictor, of which one dimension completely determines the outcome and the rest are noise. Your shortest possible generating algorithm is going to have to pick out the relevant dimension. So as the dimensionality of the predictor increases, the algorithm length will necessarily increase, just for information-theoretic reasons.
How do you overfit Kepler’s law?
edit: Retracted. I see now looking at the actual link the result wasn’t just obfuscated but wrong, and so the manner in which it’s wrong can overfit of course (and that matches the results).
To the extent that Kepler’s laws are exact only for two-body systems of point masses (so I guess calling Kepler’s third law the ground truth is a bit problematic) and to the extent that the data are imperfectly observed, there are residuals that over-eager models can try to match.
Edit: More generally, you don’t overfit the underlying law, you overfit noisy data generated by the underlying law.
Kepler’s law holds well. The influences of other planets are negligible for the precision we dealt with.
Dimensions irrelevant for the output, will fall out. Regardless if they are random or not. If they somehow (anyhow) contribute, their influence will remain in the evolved algorithm.
The simplest algorithm in the Kolmogorov’s sense is the best you can hope for.