# gwern comments on The Logic of the Hypothesis Test: A Steel Man

• What’s in­ter­est­ing to me about this pro­cess is that it al­most tries to avoid in­duc­tion al­to­gether. Only the move from step 4 to 5 seems any­thing like an in­duc­tive ar­gu­ment. The rest is purely de­duc­tive—though ad­mit­tedly it takes a cou­ple premises in or­der to quan­tify just how likely our sam­ple was and that surely has some­thing to do with in­duc­tion. But it’s still a bit like solv­ing the prob­lem of in­duc­tion by sweep­ing it un­der the rug then putting a big heavy de­duc­tion table on top so no one no­tices the lumps un­der­neath.

One in­ter­est­ing thing here is that you start with a null vs other hy­poth­e­sis, but that’s be­cause you’re do­ing a two-sam­ple z/​t-test. But what’s go­ing on when you do a one sam­ple z/​t-test and get out a con­fi­dence in­ter­val?

• The one sided hy­poth­e­sis test is still null vs. other be­cause it uses the full pa­ram­e­ter space, i.e. it’s H0: mu ⇐ c vs. Ha: mu > c. We pre­sent it to un­der­grads as H0: mu = c vs. Ha: mu > c in or­der to sim­plify (I think that’s the rea­son any­way) but re­ally we’re test­ing the former. The Kar­lin-Ru­bin the­o­rem jus­tifies this.

• I don’t fol­low… that sounds like you’re giv­ing the defi­ni­tion of a one-tailed hy­poth­e­sis test. What does that have to do with a con­stant c? Sup­pose I do this in R:

``````R> set.seed(12345); t.test(rnorm(20))

One Sam­ple t-test

data:  rnorm(20)
t = 0.4103, df = 19, p-value = 0.6861
al­ter­na­tive hy­poth­e­sis: true mean is not equal to 0
95 per­cent con­fi­dence in­ter­val:
−0.3138  0.4668
sam­ple es­ti­mates:
mean of x
0.07652
``````

And get a 95% CI of (-0.3138-0.4668); if my null hy­poth­e­sis (H0) is my mu or sam­ple mean (0.07652), then you say my Ha is mu > c, or 0.07652 > c. What is this c?

• So reread­ing your first com­ment, I re­al­ize you said one-sam­ple vs. two-sam­ple hy­poth­e­sis test and not one-sided vs. two-sided (ore one-tailed vs. two-tailed). If that’s what you meant, I don’t fol­low your first com­ment. The t-test I gave in the post is a one-sam­ple test—and I don’t un­der­stand how the differ­ence be­tween the two is rele­vant here.

But to an­swer your ques­tion any­way:

I don’t fol­low… that sounds like you’re giv­ing the defi­ni­tion of a one-tailed hy­poth­e­sis test. What does that have to do with a con­stant c? Sup­pose I do this in R:

And get a 95% CI of (-0.3138-0.4668); if my null hy­poth­e­sis (H0) is my mu or sam­ple mean (0.07652), then you say my Ha is mu > c, or 0.07652 > c. What is this c?

c is the value you’re test­ing as the null hy­poth­e­sis. In that R-code, R as­sumes that c=0 so that H0: mu=c and Ha: mu=/​=c. For the R code:

``````t.test(data, al­ter­na­tive=”greater”, mu=c)
``````

You perform a t test with H0: mu<=c and Ha: mu>c.

• I’m in­ter­ested in the calcu­lated con­fi­dence in­ter­val, not the p-value nec­es­sar­ily. Noodling around some more, I think I’m start­ing to un­der­stand it more: the con­fi­dence in­ter­val isn’t calcu­lated with re­spect to the H0 of 0 which the R code de­faults to, it’s calcu­lated based purely on the mean (and then an H0 of 0 is as­sumed to spit out some p-value)

``````R> set.seed(12345); t.test(rnorm(20,100,15))

One Sam­ple t-test

data:  rnorm(20, 100, 15)
t = 36.16, df = 19, p-value < 2.2e-16
al­ter­na­tive hy­poth­e­sis: true mean is not equal to 0
95 per­cent con­fi­dence in­ter­val:
95.29 107.00
sam­ple es­ti­mates:
mean of x
101.1
R>
R> 107-95.29
[1] 11.71
R> 107 - (11.71/​2)
[1] 101.1
``````

Hm… I’m try­ing to fit this as­sump­tion into your frame­work....

1. Either h0, true mean = sam­ple mean; or ha, true mean != sam­ple mean

2. con­struct the test statis­tic: ‘t = sam­ple mean—sam­ple mean /​ s/​sqrt(n)’

3. ‘t = 0 /​ s/​sqrt(n)’; t = 0

4. … a con­fi­dence interval

• A 95% con­fi­dence in­ter­val is sort of like test­ing H0:mu=c vs Ha:mu=\=c for all val­ues of c at the same time. In fact if you re­ject the null hy­poth­e­sis for a given c when c is out­side your calcu­lated con­fi­dence in­ter­val and fail to re­ject oth­er­wise, you’re perform­ing the ex­act same t-test with the ex­act same re­jec­tion crite­ria as the usual one (that is if the p-value is less than 0.05).

The for­mula for the test statis­tic is (gen­er­ally) t = (es­ti­mate—c)/​(stan­dard er­ror of es­ti­mate) while the for­mula for a con­fi­dence in­ter­val is (gen­er­ally) es­ti­mate +/​- t^(stan­dard er­ror of es­ti­mate) where t^ is a quan­tile of the t dis­tri­bu­tion with ap­pro­pri­ate de­grees of free­dom, cho­sen ac­cord­ing to your de­sired con­fi­dence level. t^* and the thresh­old for re­ject­ing the null in a hy­poth­e­sis test are in­ti­mately re­lated. If you google “con­fi­dence in­ter­vals and p val­ues” I’m sure you’ll find a more pol­ished and de­tailed ex­pla­na­tion of this than mine.