If we care about determining θ for some scientific purpose, then good predictive accuracy may be an unsuitable metric. For instance, even though margarine consumption might correlate well with (and hence be a good predictor of) divorce rate, that doesn’t mean that there is a causal relationship between the two.
So, the problem here is that we have so much data that a correlation exists by sheer coincidence. If we allow a model to find this correlation anyway, then this just seems like basic overfitting. I don’t see any different mechanism at work here.
Conversely, if this is trying to make a point about correlation vs. causation, why choose an example where the correlation isn’t real? I mean, presumably we don’t expect margarine consumption to be a predictor for divorce rate in the future. (I wouldn’t have been too surprised if there was a real correlation because both are caused by some third factor, but the site this links to lists other cases where the correlation is clearly coincidental, so I assume this one is, too.) Moreover, if it is about correlation vs. causation, this also seems to be unsolvable at the level of parameter selection.
And the thing is that the only other problem you’ve named of “declare the θ maximizing predictive accuracy to be the “correct” value of θ” is
While θ might do a good job of predicting y in the settings we’ve seen, it may not predict y well in very different settings.
Which (correct me if I’m wrong) also seems completely unsolvable at the level of parameter selection. In the example of the parametric curve, it could be solved at the level of model selection, but the description seems to point to the level of data selection.
So, I don’t actually understand how either of this is an argument against optimizing θ for predictive accuracy.
I’m highly confused by this part:
So, the problem here is that we have so much data that a correlation exists by sheer coincidence. If we allow a model to find this correlation anyway, then this just seems like basic overfitting. I don’t see any different mechanism at work here.
Conversely, if this is trying to make a point about correlation vs. causation, why choose an example where the correlation isn’t real? I mean, presumably we don’t expect margarine consumption to be a predictor for divorce rate in the future. (I wouldn’t have been too surprised if there was a real correlation because both are caused by some third factor, but the site this links to lists other cases where the correlation is clearly coincidental, so I assume this one is, too.) Moreover, if it is about correlation vs. causation, this also seems to be unsolvable at the level of parameter selection.
And the thing is that the only other problem you’ve named of “declare the θ maximizing predictive accuracy to be the “correct” value of θ” is
Which (correct me if I’m wrong) also seems completely unsolvable at the level of parameter selection. In the example of the parametric curve, it could be solved at the level of model selection, but the description seems to point to the level of data selection.
So, I don’t actually understand how either of this is an argument against optimizing θ for predictive accuracy.