My gut reaction is that this doesn’t demonstrate that SPRs are good, just that humans are bad. There are tons of statistical modeling algorithms that are more sophisticated than SPRs.
Unless, of course, SPR is another word for “any statistical modeling algorithm”, in which case this is just the claim that statistical machine learning is a good approach, which anyone as Bayesian as the average LessWronger probably agrees with.
There are tons of statistical modeling algorithms that are more sophisticated than SPRs.
Not in and of itself a good thing. As demonstrated recently sophisticated statistics can suffice simply to allow one to confuse oneself in a sophisticated knot—that’s harder to untie. There is a case to be made for promoting the simplest algorithm that outperforms current methods, and SPRs seem to fit this bill.
As for what SPR stands for, the post makes it pretty clear that they are a class of rules that predict a (desired) property using weighted cues (observable properties). I am not familiar enough with statistical modelling to say if that is a shared goal among all algorithms.
The post gives an example of an SPR that uses weighted cues. But he specifically says
This particular SPR is called a proper linear model,
indicating that there are other types of SPRs, and I currently have no idea what those other types might be.
I agree with you that complicated statistical tests can lead to spurious results; simple statistical tests can also lead to spurious results if the person using them doesn’t understand them. I naievely associate both of these with “the test was designed to correct against a different type of flaw in experimental design than actually occurred”.
When the focus of the statistical test is on accurately modeling a given situation, I think it is less difficult to realize when a model choice makes sense and when it doesn’t, so more sophisticated approaches will probably do better, since they come closer to carving reality at its joints. This might be an inferential distance error on my part, though, since I have training in this area, so errors that I personally can avoid might not be generally avoidable.
Also, while this isn’t super-relevant, given that I already agree with your claim about people confusing themselves, my impression is that the link you gave presents moderate-to-weak evidence against this.
I didn’t read the entire article that was linked to discussing the statistical analysis (if there’s a particular section you think I should read, please let me know), but my understanding was that in some sense the “experimental procedure” was the issue, not the statistics. In other words, Bem considered potentially hundreds of hypotheses about his data, but only reported on a few, so that p-values of 0.02 are not super-impressive (since out of 100 hypotheses we would expect a few to hit that by chance).
Bem’s experiments all basically ask “is this coin biased”, which isn’t a very complicated question to answer. It is the sophisticated statistics that corrects for the flawed procedure.
It wasn’t a very good example at all. I basically grepped my memory for “idiot statistics” and that one featured strongly. The problem there was not a misuse of statistical tests, it was a misinterpretation of the significance of statistical tests.
My gut reaction is that this doesn’t demonstrate that SPRs are good, just that humans are bad. There are tons of statistical modeling algorithms that are more sophisticated than SPRs.
Unless, of course, SPR is another word for “any statistical modeling algorithm”, in which case this is just the claim that statistical machine learning is a good approach, which anyone as Bayesian as the average LessWronger probably agrees with.
Not in and of itself a good thing. As demonstrated recently sophisticated statistics can suffice simply to allow one to confuse oneself in a sophisticated knot—that’s harder to untie. There is a case to be made for promoting the simplest algorithm that outperforms current methods, and SPRs seem to fit this bill.
As for what SPR stands for, the post makes it pretty clear that they are a class of rules that predict a (desired) property using weighted cues (observable properties). I am not familiar enough with statistical modelling to say if that is a shared goal among all algorithms.
The post gives an example of an SPR that uses weighted cues. But he specifically says
indicating that there are other types of SPRs, and I currently have no idea what those other types might be.
I agree with you that complicated statistical tests can lead to spurious results; simple statistical tests can also lead to spurious results if the person using them doesn’t understand them. I naievely associate both of these with “the test was designed to correct against a different type of flaw in experimental design than actually occurred”.
When the focus of the statistical test is on accurately modeling a given situation, I think it is less difficult to realize when a model choice makes sense and when it doesn’t, so more sophisticated approaches will probably do better, since they come closer to carving reality at its joints. This might be an inferential distance error on my part, though, since I have training in this area, so errors that I personally can avoid might not be generally avoidable.
I agree with you for smart people; I do see a lot of value, though, in idiot-proof statistics. Weighted-cue SPRs are almost too simple to screw up.
Also, while this isn’t super-relevant, given that I already agree with your claim about people confusing themselves, my impression is that the link you gave presents moderate-to-weak evidence against this.
I didn’t read the entire article that was linked to discussing the statistical analysis (if there’s a particular section you think I should read, please let me know), but my understanding was that in some sense the “experimental procedure” was the issue, not the statistics. In other words, Bem considered potentially hundreds of hypotheses about his data, but only reported on a few, so that p-values of 0.02 are not super-impressive (since out of 100 hypotheses we would expect a few to hit that by chance).
Bem’s experiments all basically ask “is this coin biased”, which isn’t a very complicated question to answer. It is the sophisticated statistics that corrects for the flawed procedure.
It wasn’t a very good example at all. I basically grepped my memory for “idiot statistics” and that one featured strongly. The problem there was not a misuse of statistical tests, it was a misinterpretation of the significance of statistical tests.