Can you really be sure that a program that you write has at least a 99.9% chance of being correct without performing moderately extensive testing? Personally, I’d probably put significantly more confidence in (b).
[EDITED to add: And I did say to test it on the primes up to 100.]
[EDITED again to add …] Here, just for reference, is what I would write, in less than a minute, to do the job. It is only intended to work for integers >= 2, and need not be bearably efficient for integers that aren’t very small.
def is_prime(n):
for i in range(2,n):
if n%i == 0: return False
return True
Four short, simple lines corresponding quite exactly to the definition of primality.
So, what could be wrong in this program that could make it give a wrong answer for 1159? I can think of the following things. (1) I could have messed up the range of values of i to test against. (2) I could have got my variables muddled up, testing i%n instead of n%i or something of the kind. (3) I could have got my return conditions backwards, making this a not-prime test. (4) I could have messed up the control flow in some way that, e.g., does something crazy when we fall off the end of that loop. (5) I could have mixed up the % (modulus) operator with something else like division or exclusive-or. (6) I could have got myself confused about what the condition for primality actually is. (7) I could have done something else that I haven’t been able to think of.
Well, most of those are very low-probability, especially given that I’ve listed them explicitly and checked the program. Further, it’s easy to see that all of them would almost certainly fail on testing against the numbers from 2 to 100, and it seems likely that most errors in category 7 would do so too. (E.g., one can imagine messing up the test in such a way that numbers of the form prime^2 get mistakenly identified as prime, or so that it only catches primes of the form 4k+1, or something.)
which has 25 entries (which I know, or think I know, is the right answer), consists entirely of numbers I believe I know to be prime, and includes every number < 100 that I believe I know to be prime.
I am very comfortable indeed reckoning this as at least 1000:1 evidence that any specific positive integer on the order of 1000 is prime iff my is_prime function says it is and gives the same result on repeated runs.
Other failure modes I haven’t listed above but want to mention that I have thought of: There could be integer overflow problems. (But not for numbers this small, and not in Python unless there’s a weird Python bug I haven’t heard of.) There could be ridiculous errors in Python itself that make it get simple arithmetic on small numbers wrong. (But I’m running Python 2.7.3, which was released some time ago and was for some time the most commonly used Python version; I would surely have heard if it had that sort of error; in any case that class of error is very, very rare.) My computer could be messed up by cosmic rays or something. (Which is why I’d run the program multiple times on 1159.)
Or, of course, my brain could be messed up in some way that specifically interferes with my understanding of prime numbers or of what it takes to check a program’s correctness. The first is already dealt with in the discussion above. The second is one of the reasons why I would also check against someone else’s list of smallish prime numbers and do a number-theoretic test by hand. (And also, I think, rather improbable.)
(7): indentation error. But I guess the interpreter will tell you i is used out of scope. That, or you would have gotten another catastrophic result on numbers below 10.
def is_prime(n):
for i in range(2,n):
if n%i == 0: return False
return True
(Edit: okay, that was LessWrong screwing up leading spaces. We can cheat that with unbreakable spaces.)
Yes, good point. I’d generalize: “I could have committed some syntax error—wrong indentation, wrong sort of brackets somewhere, etc. -- that just happens to leave a program with correct syntax but unintended semantics.”
I guess the interpreter will tell you i is used out of scope.
Depends on exactly what error occurs. (With the formatting the way it actually looks here it would just be a syntax error.)
That, or you would have gotten another catastrophic result on numbers below 10.
This (aside from the simplicity of the code) is the real reason why I’m still happy with an error probability less than 0.001. It takes a really odd sort of error to produce correct results for the first 100 candidate primes and wrong results thereafter. I could write some programs with that sort of error, but they wouldn’t look at all like naive primality checkers with simple typos.
OK. I agree. Checking the output on numbers less than 100 vastly decreases the probability of error, especially after enumerating simple errors and realizing that none of them would return anything remotely correct on those numbers.
Well, but you can (a) preform moderately extensive testing, and (b) do redundancy.
If you write 3 programs for verifying primeness (using different algorithms and possibly programming languages/approaches); and if all their results match, then you can assume a much higher confidence in correctness than for a single such program.
> def isprime(n):
> > if n == 1 or n % 2 == 0: return False
> > i = 3
> > while i*i <= n:
> > > if n % i == 0: return False
> > > i += 2
> > return True
>
> print(all(map(isprime, [3,5,7,11,13,17,23,53])))
> print(isprime(1159))
(apparently LW’s markdown implementation butchers whitespace in code)
when I run that it tells me that 1159 isn’t prime. This programme is simple enough that I would after spending an hour or so, at my mental peak condition, proving it correct, trust it with very high stakes.
Good catch, but I bet MagnetoHydroDynamics would not lose the bet “after spending an hour or so, at [their] mental peak condition, proving it correct”.
True. It’s not a perfect illustration.
But it’s a good, ironic example of how the point that MagnetoHydroDynamics was trying to make can have a pretty serious flaw.
Meh. It was a special case he already knew to call out for special treatment. Would have been caught VERY quickly in testing, just like the same sort of error occurring in the general case (hmm. the primes are 4, 6, 8, 9, 10, 12? nooo...)
99.99% confidence implies that he could write TEN THOUSAND programs of similar difficulty (with hours spent verifying each at peak mental condition) and make only ONE mistake.
As someone who’s done quite a bit of programming and reading about programming, I’d be impressed if he made it passed a hundred without making two mistakes.
The industry average is 15-50 errors / 1000 lines of code. The amount of effort required to get below 0.1 errors/kLoC, like they do for the space shuttle, is very, very large. One person checking their own work for an hour doesn’t stand a chance.
That’s what I meant, though I can see that it’s not clear.
In this case a mistake would be writing the code, iterating and testing until you were satisfied, pronouncing it done, and then afterwards catching the error.
Hmm. I can definitely buy that a program of more complexity than this—less easily checked—would have that accuracy rate.
But prime checking is super simple to write and super simple to check. The only way you’d get an error through the obvious testing scheme given is to skip the testing.
You’re taking 100 bits of testing (which contain around 80 bits of information if not produced by means of the actual pattern) and treating them as around 13 bits of reliability.
My experience with coding is that stupid obvious mistakes are way more likely than 1/10000. You write something slightly wrong, keep reading it as if it were right, and that’s that.
Determining if a number is prime is a bit of a nice case, I suppose, because it’s so amenable to testing. The structure of the mistakes you make is unlikely to match the structure of primes, so you’ll catch any mistakes more easily.
I’d still consider doing it 10000 times to be extremely difficult. Just adding 10000 six-digit numbers by hand, even with some cross-checking, is quite difficult.
Overall the space shuttle software is definitely more complicated than confirming a small prime, but on a line by line basis I don’t know. The space shuttle is more of a special case example; not something you compare against.
I’d use the industry average rate until you showed me you could hit 0.1 errors/kloc for a few years. For example, Microsoft hits 10-20 defects/kloc and they put significant effort into testing.
The most likely reason that your bug rate on these programs would be anomalously low is the fact that they’re small. Instead of writing one hundred thousand line program, you’re writing 10000 ten line programs. The complexities won’t compound as much.
The probability of mistake in a program is not the same as the probability of being wrong about 1159 being a composite. At high reliability level, you factor the number, then you multiply up the factors, and check that as well.
Can you really be sure that a program that you write has at least a 99.9% chance of being correct without performing moderately extensive testing? Personally, I’d probably put significantly more confidence in (b).
A program this simple? Yes.
[EDITED to add: And I did say to test it on the primes up to 100.]
[EDITED again to add …] Here, just for reference, is what I would write, in less than a minute, to do the job. It is only intended to work for integers >= 2, and need not be bearably efficient for integers that aren’t very small.
Four short, simple lines corresponding quite exactly to the definition of primality.
So, what could be wrong in this program that could make it give a wrong answer for 1159? I can think of the following things. (1) I could have messed up the range of values of i to test against. (2) I could have got my variables muddled up, testing i%n instead of n%i or something of the kind. (3) I could have got my return conditions backwards, making this a not-prime test. (4) I could have messed up the control flow in some way that, e.g., does something crazy when we fall off the end of that loop. (5) I could have mixed up the % (modulus) operator with something else like division or exclusive-or. (6) I could have got myself confused about what the condition for primality actually is. (7) I could have done something else that I haven’t been able to think of.
Well, most of those are very low-probability, especially given that I’ve listed them explicitly and checked the program. Further, it’s easy to see that all of them would almost certainly fail on testing against the numbers from 2 to 100, and it seems likely that most errors in category 7 would do so too. (E.g., one can imagine messing up the test in such a way that numbers of the form prime^2 get mistakenly identified as prime, or so that it only catches primes of the form 4k+1, or something.)
And, lo and behold, I then did
and got
which has 25 entries (which I know, or think I know, is the right answer), consists entirely of numbers I believe I know to be prime, and includes every number < 100 that I believe I know to be prime.
I am very comfortable indeed reckoning this as at least 1000:1 evidence that any specific positive integer on the order of 1000 is prime iff my is_prime function says it is and gives the same result on repeated runs.
Other failure modes I haven’t listed above but want to mention that I have thought of: There could be integer overflow problems. (But not for numbers this small, and not in Python unless there’s a weird Python bug I haven’t heard of.) There could be ridiculous errors in Python itself that make it get simple arithmetic on small numbers wrong. (But I’m running Python 2.7.3, which was released some time ago and was for some time the most commonly used Python version; I would surely have heard if it had that sort of error; in any case that class of error is very, very rare.) My computer could be messed up by cosmic rays or something. (Which is why I’d run the program multiple times on 1159.)
Or, of course, my brain could be messed up in some way that specifically interferes with my understanding of prime numbers or of what it takes to check a program’s correctness. The first is already dealt with in the discussion above. The second is one of the reasons why I would also check against someone else’s list of smallish prime numbers and do a number-theoretic test by hand. (And also, I think, rather improbable.)
(7): indentation error. But I guess the interpreter will tell you
i
is used out of scope. That, or you would have gotten another catastrophic result on numbers below 10.(Edit: okay, that was LessWrong screwing up leading spaces. We can cheat that with unbreakable spaces.)
Yes, good point. I’d generalize: “I could have committed some syntax error—wrong indentation, wrong sort of brackets somewhere, etc. -- that just happens to leave a program with correct syntax but unintended semantics.”
Depends on exactly what error occurs. (With the formatting the way it actually looks here it would just be a syntax error.)
This (aside from the simplicity of the code) is the real reason why I’m still happy with an error probability less than 0.001. It takes a really odd sort of error to produce correct results for the first 100 candidate primes and wrong results thereafter. I could write some programs with that sort of error, but they wouldn’t look at all like naive primality checkers with simple typos.
OK. I agree. Checking the output on numbers less than 100 vastly decreases the probability of error, especially after enumerating simple errors and realizing that none of them would return anything remotely correct on those numbers.
Well, but you can (a) preform moderately extensive testing, and (b) do redundancy.
If you write 3 programs for verifying primeness (using different algorithms and possibly programming languages/approaches); and if all their results match, then you can assume a much higher confidence in correctness than for a single such program.
I think multiple programs wouldn’t help very much unless they were run on different processors and written in different languages.
Yes. I agree. I am certainly not trying to say that 99.99% confidence in the primality status of a four digit number is not achievable.
I can take a stab at it (in python):
(apparently LW’s markdown implementation butchers whitespace in code)
when I run that it tells me that 1159 isn’t prime. This programme is simple enough that I would after spending an hour or so, at my mental peak condition, proving it correct, trust it with very high stakes.
And… You’d lose the bet. That program does have a mistake in it. See if you can find it.
(Answer: Gjb vf cevzr, ohg lbhe shapgvba pnyyf vg pbzcbfvgr.)
Good catch, but I bet MagnetoHydroDynamics would not lose the bet “after spending an hour or so, at [their] mental peak condition, proving it correct”.
True. It’s not a perfect illustration. But it’s a good, ironic example of how the point that MagnetoHydroDynamics was trying to make can have a pretty serious flaw.
Meh. It was a special case he already knew to call out for special treatment. Would have been caught VERY quickly in testing, just like the same sort of error occurring in the general case (hmm. the primes are 4, 6, 8, 9, 10, 12? nooo...)
99.99% confidence implies that he could write TEN THOUSAND programs of similar difficulty (with hours spent verifying each at peak mental condition) and make only ONE mistake.
As someone who’s done quite a bit of programming and reading about programming, I’d be impressed if he made it passed a hundred without making two mistakes.
The industry average is 15-50 errors / 1000 lines of code. The amount of effort required to get below 0.1 errors/kLoC, like they do for the space shuttle, is very, very large. One person checking their own work for an hour doesn’t stand a chance.
Only make one mistake that makes it past testing.
That’s what I meant, though I can see that it’s not clear.
In this case a mistake would be writing the code, iterating and testing until you were satisfied, pronouncing it done, and then afterwards catching the error.
Hmm. I can definitely buy that a program of more complexity than this—less easily checked—would have that accuracy rate.
But prime checking is super simple to write and super simple to check. The only way you’d get an error through the obvious testing scheme given is to skip the testing.
You’re taking 100 bits of testing (which contain around 80 bits of information if not produced by means of the actual pattern) and treating them as around 13 bits of reliability.
My experience with coding is that stupid obvious mistakes are way more likely than 1/10000. You write something slightly wrong, keep reading it as if it were right, and that’s that.
Determining if a number is prime is a bit of a nice case, I suppose, because it’s so amenable to testing. The structure of the mistakes you make is unlikely to match the structure of primes, so you’ll catch any mistakes more easily.
I’d still consider doing it 10000 times to be extremely difficult. Just adding 10000 six-digit numbers by hand, even with some cross-checking, is quite difficult.
Yes, stupid coding mistakes are more like 1 in 2 than 1 in 10^4; it is the testing that helps here.
Are the programs written for the space shuttle the same level of difficulty as this one?
Overall the space shuttle software is definitely more complicated than confirming a small prime, but on a line by line basis I don’t know. The space shuttle is more of a special case example; not something you compare against.
I’d use the industry average rate until you showed me you could hit 0.1 errors/kloc for a few years. For example, Microsoft hits 10-20 defects/kloc and they put significant effort into testing.
The most likely reason that your bug rate on these programs would be anomalously low is the fact that they’re small. Instead of writing one hundred thousand line program, you’re writing 10000 ten line programs. The complexities won’t compound as much.
The probability of mistake in a program is not the same as the probability of being wrong about 1159 being a composite. At high reliability level, you factor the number, then you multiply up the factors, and check that as well.