It’s too bad we can’t do the same to find when bugs were introduced—developers don’t generally label as such commits that contain bugs.
If they did, it would make the bugs easier to find.
If I had to automate that, I’d consider the lines of code changed by the update. For each line changed, I’d find the last time that that line had been changed; I’d take the earliest of these dates.
However, many bugs are fixed not by lines changed, but by lines added. I’m not sure how to date those; the date of the creation of the function containing the new line? The date of the last change to that function? I can imagine situations where either of those could be valid. Again, I would take the earliest applicable date.
I should probably also ignore lines that are only comments.
This one is interesting—it remained undetected for two years, was very cheap to fix (just add the commented out line back in), but had large and hard to estimate indirect costs.
Among people who buy into the “rising cost of defects” theory, there’s a common mistake: conflating “cost to fix” and “cost of the bug”. This is especially apparent in arguments that bugs in the field are “obviously” very costly to fix, because the software has been distributed in many places, etc. That strikes me as a category error.
many bugs are fixed not by lines changed, but by lines added
Many bugs are also fixed by adding or changing (or in fact deleting) code elsewhere than the place where the bug was introduced—the well-known game of workarounds.
At least one well-known bug I know about consisted of commenting out a single line of code.
I take your point. I should only ignore lines that are comments both before and after the change; commenting or uncommenting code can clearly be a bugfix. (Or can introduce a bug, of course).
Among people who buy into the “rising cost of defects” theory, there’s a common mistake: conflating “cost to fix” and “cost of the bug”. This is especially apparent in arguments that bugs in the field are “obviously” very costly to fix, because the software has been distributed in many places, etc. That strikes me as a category error.
Hmmm. “Cost to fix”, to my mind, should include the cost to find the bug and the cost to repair the bug. “Cost of the bug” should include all the knock-on effects of the bug having been active in the field for some time (which could be lost productivity, financial losses, information leakage, and just about anything, depending on the bug).
Many bugs are also fixed by adding or changing (or in fact deleting) code elsewhere than the place where the bug was introduced—the well-known game of workarounds.
I would assert that this does not fix the bug at all; it simply makes the bug less relevant (hopefully, irrelevant to the end user). If I write a function that’s supposed to return a+b, and it instead returns a+b+1, then this can easily be worked around by subtracting one from the return value every time it is used; but the downside is that the function is still returning the wrong value (a trap for any future maintainers) and, moreover, it makes the actual bug even more expensive to fix (since once it is fixed, all the extraneous minus-ones must be tracked down and removed).
If they did, it would make the bugs easier to find.
If I had to automate that, I’d consider the lines of code changed by the update. For each line changed, I’d find the last time that that line had been changed; I’d take the earliest of these dates.
However, many bugs are fixed not by lines changed, but by lines added. I’m not sure how to date those; the date of the creation of the function containing the new line? The date of the last change to that function? I can imagine situations where either of those could be valid. Again, I would take the earliest applicable date.
I should probably also ignore lines that are only comments.
At least one well-known bug I know about consisted of commenting out a single line of code.
This one is interesting—it remained undetected for two years, was very cheap to fix (just add the commented out line back in), but had large and hard to estimate indirect costs.
Among people who buy into the “rising cost of defects” theory, there’s a common mistake: conflating “cost to fix” and “cost of the bug”. This is especially apparent in arguments that bugs in the field are “obviously” very costly to fix, because the software has been distributed in many places, etc. That strikes me as a category error.
Many bugs are also fixed by adding or changing (or in fact deleting) code elsewhere than the place where the bug was introduced—the well-known game of workarounds.
I take your point. I should only ignore lines that are comments both before and after the change; commenting or uncommenting code can clearly be a bugfix. (Or can introduce a bug, of course).
Hmmm. “Cost to fix”, to my mind, should include the cost to find the bug and the cost to repair the bug. “Cost of the bug” should include all the knock-on effects of the bug having been active in the field for some time (which could be lost productivity, financial losses, information leakage, and just about anything, depending on the bug).
I would assert that this does not fix the bug at all; it simply makes the bug less relevant (hopefully, irrelevant to the end user). If I write a function that’s supposed to return a+b, and it instead returns a+b+1, then this can easily be worked around by subtracting one from the return value every time it is used; but the downside is that the function is still returning the wrong value (a trap for any future maintainers) and, moreover, it makes the actual bug even more expensive to fix (since once it is fixed, all the extraneous minus-ones must be tracked down and removed).