Bayes Questions

For the first time ever I’ve had rea­son to use for­mal Bayesian statis­tics in my work. I feel this is a cause for stream­ers and con­fetti.

How­ever, I’ve got a bit stuck on analysing my con­fi­dence lev­els and I thought what bet­ter place than less­wrong to check that I’m mak­ing sense. I’m not sure this is strictly what less­wrong is for but the site says that this is my own per­sonal blog so I guess its ok?! I can always be down-voted to hell if not!

***

I’m try­ing to calcu­late es­ti­mated life be­fore failure of a par­tic­u­lar com­po­nent.

We’ve done a num­ber of tests and the re­sults show a larger than ex­pected var­i­ance with some com­po­nents not failing even af­ter an ex­tended life­time. I’m try­ing to analyse the re­sults to see which prob­a­bil­is­tic failure dis­tri­bu­tion best suits the available data. I have three differ­ent dis­tri­bu­tions (Weibull, Log-nor­mal & Birn­baum-Saun­ders) each of which has a shape pa­ram­e­ter and a scale pa­ram­e­ter.

For each dis­tri­bu­tion I cre­ated a grid which sam­ples the pos­si­ble val­ues of these pa­ram­e­ters. I’ve given the pa­ram­e­ters a log uniform prior by giv­ing each sam­pled pa­ram­e­ter pair a uniform prior but sam­pling the pa­ram­e­ters ge­o­met­ri­cally (i.e. each sam­pled value of the pa­ram­e­ter is a fixed mul­ti­ple of the pre­vi­ous value). I’ve tried other pri­ors and the re­sults seem fairly ro­bust over choice of prior.

For com­po­nents which have failed, P(E|H) is the prob­a­bil­ity den­sity func­tion at the num­ber of hours be­fore failure.

For com­po­nents which get to a cer­tain age and do not fail, P(E|H) is 1 – the cu­mu­la­tive prob­a­bil­ity func­tion at this num­ber of hours.

This is im­ple­mented on a spread­sheet with a tab for each test re­sult. It up­dates the prior prob­a­bil­ity into a pos­te­rior prob­a­bil­ity and then uses this as the prior for the next tab. The re­sult is nor­mal­ised to give to­tal prob­a­bil­ity of 1.

Ini­tially I calcu­late the ex­pected life of the worst com­po­nent in 1,000,000. For this, I just use the in­verse cu­mu­la­tive prob­a­bil­ity func­tion with p=0.000001 and calcu­late this for all of the po­ten­tial prob­a­bil­ity dis­tri­bu­tions.

The re­sults of this calcu­la­tion are mul­ti­plied by the fi­nal prob­a­bil­ities of each dis­tri­bu­tion be­ing the cor­rect one. Then I sum this over the en­tire hy­poth­e­sis space to give the ex­pected life of the worst com­po­nent in a mil­lion.

So my first ques­tion – is all of the above sound or have I made some silly mis­take in my logic?

***

The part that I’m less con­fi­dent about is how to analyse my 95% con­fi­dence level of the life of the worst com­po­nent in 1,000,000.

The ob­vi­ous way to ap­proach this is that I should just calcu­late my ex­pected value for the worst com­po­nent in 20,000,000. Then, for any given mil­lion that I se­lect, I have a 5% chance of se­lect­ing the worst in 20,000,000. This is treat­ing 95% as my con­fi­dence from the over­all weighted model.

Alter­na­tively, I can treat the 95% as refer­ring to my con­fi­dence in which is the one cor­rect prob­a­bil­ity dis­tri­bu­tion. In this case, af­ter I nor­mal­ise my prob­a­bil­ities so that the sum of all of the hy­pothe­ses is 1, I start delet­ing the least likely hy­pothe­ses. I keep delet­ing un­likely hy­pothe­ses un­til the sum of all of the re­main­ing hy­pothe­ses is <0.95. The last hy­poth­e­sis which was deleted is the top end of my 95% con­fi­dence level.

Now if I calcu­late the ex­pected life of the worst com­po­nent in 1,000,000 for that in­di­vi­d­ual model I think I can ar­gue that this also rep­re­sents my 95% con­fi­dence level of the worst com­po­nent in 1,000,000 but in a differ­ent way.

Is ei­ther of these bet­ter than the other? Is there an al­ter­na­tive defi­ni­tion of con­fi­dence level which I should be us­ing?

The con­fi­dence-in-which-dis­tri­bu­tion ver­sion gives much more con­ser­va­tive an­swers, par­tic­u­larly when the num­ber of tests is small; the con­fi­dence-from-over­all-model is much more for­giv­ing of hav­ing fewer tests. Even af­ter only a cou­ple of tests the lat­ter gives a 95% con­fi­dence level rel­a­tively close to the ex­pected value, whereas the con­fi­dence-in-which-dis­tri­bu­tion ver­sion re­mains fur­ther away from the ex­pected value un­til a larger num­ber of tests is performed.

This seems to me to be more re­al­is­tic but I don’t have a proper ar­gu­ment to ac­tu­ally jus­tify a de­ci­sion ei­ther way.

Any help warmly wel­comed.