The Logic of the Hypothesis Test: A Steel Man

Re­lated to: Beyond Bayesi­ans and Frequentists

Up­date: This com­ment by Cyan clearly ex­plains the mis­take I made—I for­got that the or­der­ing of the hy­poth­e­sis space is im­por­tant is nec­es­sary for hy­poth­e­sis test­ing to work. I’m not en­tirely con­vinced that NHST can’t be re­cast in some “thin” the­ory of in­duc­tion that may well change the de­tails of the ac­tual test, but I have no idea how to for­mal­ize this no­tion of a “thin” the­ory and most of the com­menters ei­ther 1) mi­s­un­der­stood my aim (my fault, not theirs) or 2) don’t think it can be for­mal­ized.

I’m teach­ing an econo­met­rics course this semester and one of the things I’m try­ing to do is make sure that my stu­dents ac­tu­ally un­der­stand the logic of the hy­poth­e­sis test. You can mo­ti­vate it in terms of con­trol­ling false pos­i­tives but that sort of in­ter­pre­ta­tion doesn’t seem to be gen­er­ally ap­pli­ca­ble. Another mo­ti­va­tion is a sim­ple de­duc­tive syl­l­o­gism with a small but very im­por­tant in­duc­tive com­po­nent. I’m bor­row­ing the idea from a some­thing we dis­cussed in a course I had with Mark Kaiser—he called it the “nested syl­l­o­gism of ex­per­i­men­ta­tion.” I think it ap­plies equally well to most or even all hy­poth­e­sis tests. It goes some­thing like this:

1. Either the null hy­poth­e­sis or the al­ter­na­tive hy­poth­e­sis is true.

2. If the null hy­poth­e­sis is true, then the data has a cer­tain prob­a­bil­ity dis­tri­bu­tion.

3. Un­der this dis­tri­bu­tion, our sam­ple is ex­tremely un­likely.

4. There­fore un­der the null hy­poth­e­sis, our sam­ple is ex­tremely un­likely.

5. There­fore the null hy­poth­e­sis is false.

6. There­fore the al­ter­na­tive hy­poth­e­sis is true.

An ex­am­ple looks like this:

Sup­pose we have a ran­dom sam­ple from a pop­u­la­tion with a nor­mal dis­tri­bu­tion that has an un­known mean and un­known var­i­ance . Then:

1. Either or where is some con­stant.

2. Con­struct the test statis­tic where is the sam­ple size, is the sam­ple mean, and is the sam­ple stan­dard de­vi­a­tion.

3. Un­der the null hy­poth­e­sis, has a dis­tri­bu­tion with de­grees of free­dom.

4. is re­ally small un­der the null hy­poth­e­sis (e.g. less than 0.05).

5. There­fore the null hy­poth­e­sis is false.

6. There­fore the al­ter­na­tive hy­poth­e­sis is true.

What’s in­ter­est­ing to me about this pro­cess is that it al­most tries to avoid in­duc­tion al­to­gether. Only the move from step 4 to 5 seems any­thing like an in­duc­tive ar­gu­ment. The rest is purely de­duc­tive—though ad­mit­tedly it takes a cou­ple premises in or­der to quan­tify just how likely our sam­ple was and that surely has some­thing to do with in­duc­tion. But it’s still a bit like solv­ing the prob­lem of in­duc­tion by sweep­ing it un­der the rug then putting a big heavy de­duc­tion table on top so no one no­tices the lumps un­der­neath.

This sounds like it’s a crit­i­cism, but ac­tu­ally I think it might be a virtue to min­i­mize the amount of in­duc­tion in your ar­gu­ment. Sup­pose you’re re­ally un­cer­tain about how to han­dle in­duc­tion. Maybe you see a lot of plau­si­ble sound­ing ap­proaches, but you can poke holes in all of them. So in­stead of try­ing to ac­tu­ally solve the prob­lem of in­duc­tion, you set out to come up with a pro­cess which is ro­bust to al­ter­na­tive views of in­duc­tion. Ideally, if one or an­other the­ory of in­duc­tion turns out to be cor­rect, you’d like it to do the least dam­age pos­si­ble to any spe­cific in­duc­tive in­fer­ences you’ve made. One way to do this is to avoid in­duc­tion as much as pos­si­ble so that you pre­vent “in­duc­tive con­tam­i­na­tion” spread­ing to ev­ery­thing you be­lieve.

That’s ex­actly what hy­poth­e­sis test­ing seems to do. You start with a set of premises and keep de­riv­ing log­i­cal con­clu­sions from them un­til you’re forced to say “this seems re­ally un­likely if a cer­tain hy­poth­e­sis is true, so we’ll as­sume that the hy­poth­e­sis is false” in or­der to get any fur­ther. Then you just keep on de­riv­ing log­i­cal con­clu­sions with your new premise. Bayesi­ans start yel­ling about the base rate fal­lacy in the in­duc­tive step, but they’re pre­sup­pos­ing their own the­ory of in­duc­tion. If you’re try­ing to be ro­bust to in­duc­tive the­o­ries, why should you listen to a Bayesian in­stead of any­one else?

Now does hy­poth­e­sis test­ing ac­tu­ally ac­com­plish in­duc­tion that is ro­bust to philo­soph­i­cal views of in­duc­tion? Well, I don’t know—I’m re­ally just spit­bal­ling here. But it does seem to be a use­ful steel man.