“Should we trust models or observations?” In reply we note that if we had observations of the future, we obviously would trust them more than models, but unfortunately observations of the future are not available at this time.
In the absence of observations of future events, observation of the past performance of your model is advisable (and rare). If your confidence in the current accuracy of your model is much higher than the past performance of your models… you may be optimizing for something other than accuracy.
Your criticism is accepted.
For my curiosity: which groups have you observed embracing the practice of introducing data on past model performance when presenting a new model? I failed to provide a source, but is it your impression that this isn’t an area in which most people perform poorly
It’s like saying “should we trust our model or the actual results?” The point is that you can only rely on models when making predictions, if you have the results you don’t need a model to come up with the results.
No, what Thomas is saying is that we should compare the model’s predictions with the actual results and use that to calibrate how much we should trust the model.
I expressed myself poorly, “should we trust our model or the actual results?” was a restating of “Should we trust models or observations?” to make it more clear what the original quote actually meant (did it?); that you will never have future observations only past observation, so when dealing with future events one can only depend on models. Of course when the future unfolds we will be able to do the observations, but then future observations has become past observation. One can only stear the course of the future, never the past. Thus trust in predictions.
I think people are somewhat talking past each other, and the following basically summarizes everyone’s position:
1) When dealing with the future, we have to make use of the best models available—we can’t base decisions now on data we don’t have yet.
2) New data should be used both to evaluate and improve models.
2a) It is important to test models against data that were not used in formulating the model, to avoid over-fitting. This can be new data as it becomes available, but should also be existing data reserved for the purpose.
Knutson and Tuleya, Journal of Climate, 2005.
In the absence of observations of future events, observation of the past performance of your model is advisable (and rare). If your confidence in the current accuracy of your model is much higher than the past performance of your models… you may be optimizing for something other than accuracy.
Agreed, and in particular on data you did not consider in formulating it.
Citation needed.
Your criticism is accepted. For my curiosity: which groups have you observed embracing the practice of introducing data on past model performance when presenting a new model? I failed to provide a source, but is it your impression that this isn’t an area in which most people perform poorly
I don’t know that I’ve observed anyone making explicit practice of it out of a formal setting.
But they are observable later. For example, we can observe now the predictions from 2005, when this quote originates.
It’s like saying “should we trust our model or the actual results?” The point is that you can only rely on models when making predictions, if you have the results you don’t need a model to come up with the results.
No, what Thomas is saying is that we should compare the model’s predictions with the actual results and use that to calibrate how much we should trust the model.
I expressed myself poorly, “should we trust our model or the actual results?” was a restating of “Should we trust models or observations?” to make it more clear what the original quote actually meant (did it?); that you will never have future observations only past observation, so when dealing with future events one can only depend on models. Of course when the future unfolds we will be able to do the observations, but then future observations has become past observation. One can only stear the course of the future, never the past. Thus trust in predictions.
I think people are somewhat talking past each other, and the following basically summarizes everyone’s position:
1) When dealing with the future, we have to make use of the best models available—we can’t base decisions now on data we don’t have yet.
2) New data should be used both to evaluate and improve models.
2a) It is important to test models against data that were not used in formulating the model, to avoid over-fitting. This can be new data as it becomes available, but should also be existing data reserved for the purpose.