While I think your comment is generally true, I feel that it’s almost a disservice to emphasize this point. A huge number of problems in the statistical sciences could be overcome by just a tiny bit of uniformity among model checking procedures. If it was seen as “bad form” to submit a journal article without doing some model expansion checks, or without providing test statistic analysis that goes beyond classical p-values, then the quality of publications would jump up. Even uniformity of the classical p-value testing would be helpful. I don’t really like the use of classical p-values and test statistics, but they do say something about model validity. However, even in that domain, the test statistics are not always computed correctly; the way in which they were computed is rarely reported; and there are tons of systematic errors made by folks unfamiliar with the theory behind the statistical tests. Even if we had to continue using classical hypothesis testing, but we could just get people to apply the tests in a correct, systematic way, this would be a huge improvement. I would happily wager eating a stick of butter to get a world in which I didn’t have to read statistical results and in my head be thinking, “Okay, how did these authors mess this up? Are they reporting the right thing? Did they just keep gathering data until they reached a significance level they wanted? Etc...”
Essentially, I think your comparison breaks down in one important way. While it may be possible to write software that is bug free, it’s not as easy to prove that your code is as efficient as it needs to be, or that it will generalize to new use cases. Unit testing definitely focuses on proving correctness and bug-free-ness. But another, less directly objective part of it is proving that your code is well-suited to the computational task. Why did you pick the algorithm, design pattern, or language that you chose? If you truly design unit tests well, then some of the tests will also address slightly higher level issues like these, which are closer to the model checking issues.
Also, I think the flip-side to the Box quote is just as important: “All models are right; most are useless.” This is discussed here.
While I think your comment is generally true, I feel that it’s almost a disservice to emphasize this point. A huge number of problems in the statistical sciences could be overcome by just a tiny bit of uniformity among model checking procedures. If it was seen as “bad form” to submit a journal article without doing some model expansion checks, or without providing test statistic analysis that goes beyond classical p-values, then the quality of publications would jump up. Even uniformity of the classical p-value testing would be helpful. I don’t really like the use of classical p-values and test statistics, but they do say something about model validity. However, even in that domain, the test statistics are not always computed correctly; the way in which they were computed is rarely reported; and there are tons of systematic errors made by folks unfamiliar with the theory behind the statistical tests. Even if we had to continue using classical hypothesis testing, but we could just get people to apply the tests in a correct, systematic way, this would be a huge improvement. I would happily wager eating a stick of butter to get a world in which I didn’t have to read statistical results and in my head be thinking, “Okay, how did these authors mess this up? Are they reporting the right thing? Did they just keep gathering data until they reached a significance level they wanted? Etc...”
Essentially, I think your comparison breaks down in one important way. While it may be possible to write software that is bug free, it’s not as easy to prove that your code is as efficient as it needs to be, or that it will generalize to new use cases. Unit testing definitely focuses on proving correctness and bug-free-ness. But another, less directly objective part of it is proving that your code is well-suited to the computational task. Why did you pick the algorithm, design pattern, or language that you chose? If you truly design unit tests well, then some of the tests will also address slightly higher level issues like these, which are closer to the model checking issues.
Also, I think the flip-side to the Box quote is just as important: “All models are right; most are useless.” This is discussed here.