I think you are thinking about transparency differently than OP.
You seem to be thinking of informal code review type stuff (hence the comments and function names gripe), and not formal, mechanical verification, which is what OP is talking about (I think).
At this point you could open up sources and have a final review (“transparency”) but honestly… what’s the point?
The point is that black box testing can only realistically verify a tiny slice of input-output space. You cannot prove theorems involving universal quantification, for example, without literally checking every input (which may not fit in the known universe). So if the system has some esoteric failure mode that you didn’t manage to test for, you don’t catch it.
On the other hand “transparent” testing is where you give eg a type checker access to the internal structure so it can immediately prove things like “nope, this function cannot match the spec, and will fail by adding a list to a number, when fed input X”.
As a serious, if trivial, example, imagine black-box testing a quicksort. You test it on 1000 large random lists and measure the average and worst case running time. You probably get O(n*log(n)) for both. You deploy the code, and someone disassembles it, and designs a killer input and pwns your system, because quicksort has rare inputs for which it goes O(n^2).
Transparency isn’t only about reading the source code or not, it’s also about whether you can do formal deduction or not.
Thus the design (i.e. “The Math”) vs implementation (i.e. “The Code”) division. I believe design verification suffers from same problems as implementation verification, albeit maybe less severely (though I never worked with really complex, novel, abstract math… it would be interesting to see how many of those, on average, are “proved” correct and then blow up).
Still, I would argue that the problem is not that black-box testing is insufficient—it is where we are currently able to apply it—but rather that we have no idea how to properly black-box-test an abstract, novel, complex system!
PS. Your trivial example is also unfair and trivializes the technique. Black-box testing in no way implies randomizing all tests and I would expect the QuickSort to blow up very very soon in serious testing.
I think you are thinking about transparency differently than OP.
You seem to be thinking of informal code review type stuff (hence the comments and function names gripe), and not formal, mechanical verification, which is what OP is talking about (I think).
The point is that black box testing can only realistically verify a tiny slice of input-output space. You cannot prove theorems involving universal quantification, for example, without literally checking every input (which may not fit in the known universe). So if the system has some esoteric failure mode that you didn’t manage to test for, you don’t catch it.
On the other hand “transparent” testing is where you give eg a type checker access to the internal structure so it can immediately prove things like “nope, this function cannot match the spec, and will fail by adding a list to a number, when fed input X”.
As a serious, if trivial, example, imagine black-box testing a quicksort. You test it on 1000 large random lists and measure the average and worst case running time. You probably get O(n*log(n)) for both. You deploy the code, and someone disassembles it, and designs a killer input and pwns your system, because quicksort has rare inputs for which it goes O(n^2).
Transparency isn’t only about reading the source code or not, it’s also about whether you can do formal deduction or not.
Thus the design (i.e. “The Math”) vs implementation (i.e. “The Code”) division. I believe design verification suffers from same problems as implementation verification, albeit maybe less severely (though I never worked with really complex, novel, abstract math… it would be interesting to see how many of those, on average, are “proved” correct and then blow up).
Still, I would argue that the problem is not that black-box testing is insufficient—it is where we are currently able to apply it—but rather that we have no idea how to properly black-box-test an abstract, novel, complex system!
PS. Your trivial example is also unfair and trivializes the technique. Black-box testing in no way implies randomizing all tests and I would expect the QuickSort to blow up very very soon in serious testing.