I would not be surprised if at least 20% of published studies include results that were affected by at least one coding error.
My intuition is that this underestimates the occurrence, depending on the field. Let us define:
CE = study has been affected by at least one coding error
SP = study relies on a significant (>500 LOC) amount of custom programming
Then I’d assign over 80% to P(CE|SP).
My mom is a semi-retired neuroscientist, she’s been telling me recently how appalled she’s been with how many researchers around her are abusing standard stats packages in egregious ways. The trouble is that scientists have access to powerful software packages for data analysis but they often lack understanding of the concepts deployed in the packages, and consequently make absurd mistakes.
“Shooting yourself in the foot” is the occupational disease of programmers, and this applies even to non-career programmers, people who program as a secondary requirement of their job and may not even have any awareness that what they’re doing is programming.
In cases where a scientist is using a software package that they are uncomfortable with, I think output basically serves as the only error checking. First, they copy some sample code and try to adapt it to their data (while not really understanding what the program does). Then, they run the software. If the results are about what they expected, they think “well, we most have done it right.” If the results are different than they expected, they might try a few more times and eventually get someone involved who knows what they are doing.
I strongly agree that you’re more likely to get wrong results out of someone else’s code than your own, because you tend to assume that they did their own error checking, and you also tend to assume that the code works the way you think it should (i.e. the way you would write it yourself), either or both of which may be false.
This is what led to my discovering a fairly significant error in my dissertation the day before I had to turn it in :) (Admittedly, self-delusion also played a role.)
My intuition is that this underestimates the occurrence, depending on the field. Let us define:
CE = study has been affected by at least one coding error
SP = study relies on a significant (>500 LOC) amount of custom programming
Then I’d assign over 80% to P(CE|SP).
My mom is a semi-retired neuroscientist, she’s been telling me recently how appalled she’s been with how many researchers around her are abusing standard stats packages in egregious ways. The trouble is that scientists have access to powerful software packages for data analysis but they often lack understanding of the concepts deployed in the packages, and consequently make absurd mistakes.
“Shooting yourself in the foot” is the occupational disease of programmers, and this applies even to non-career programmers, people who program as a secondary requirement of their job and may not even have any awareness that what they’re doing is programming.
In cases where a scientist is using a software package that they are uncomfortable with, I think output basically serves as the only error checking. First, they copy some sample code and try to adapt it to their data (while not really understanding what the program does). Then, they run the software. If the results are about what they expected, they think “well, we most have done it right.” If the results are different than they expected, they might try a few more times and eventually get someone involved who knows what they are doing.
I strongly agree that you’re more likely to get wrong results out of someone else’s code than your own, because you tend to assume that they did their own error checking, and you also tend to assume that the code works the way you think it should (i.e. the way you would write it yourself), either or both of which may be false.
This is what led to my discovering a fairly significant error in my dissertation the day before I had to turn it in :) (Admittedly, self-delusion also played a role.)