Bayes Questions

For the first time ever I’ve had reason to use formal Bayesian stat­ist­ics in my work. I feel this is a cause for stream­ers and con­fetti.

However, I’ve got a bit stuck on ana­lys­ing my con­fid­ence levels and I thought what bet­ter place than less­wrong to check that I’m mak­ing sense. I’m not sure this is strictly what less­wrong is for but the site says that this is my own per­sonal blog so I guess its ok?! I can al­ways be down-voted to hell if not!


I’m try­ing to cal­cu­late es­tim­ated life be­fore fail­ure of a par­tic­u­lar com­pon­ent.

We’ve done a num­ber of tests and the res­ults show a lar­ger than ex­pec­ted vari­ance with some com­pon­ents not fail­ing even after an ex­ten­ded life­time. I’m try­ing to ana­lyse the res­ults to see which prob­ab­il­istic fail­ure dis­tri­bu­tion best suits the avail­able data. I have three dif­fer­ent dis­tri­bu­tions (Weibull, Log-nor­mal & Birn­baum-Saun­ders) each of which has a shape para­meter and a scale para­meter.

For each dis­tri­bu­tion I cre­ated a grid which samples the pos­sible val­ues of these para­met­ers. I’ve given the para­met­ers a log uni­form prior by giv­ing each sampled para­meter pair a uni­form prior but sampling the para­met­ers geo­met­ric­ally (i.e. each sampled value of the para­meter is a fixed mul­tiple of the pre­vi­ous value). I’ve tried other pri­ors and the res­ults seem fairly ro­bust over choice of prior.

For com­pon­ents which have failed, P(E|H) is the prob­ab­il­ity dens­ity func­tion at the num­ber of hours be­fore fail­ure.

For com­pon­ents which get to a cer­tain age and do not fail, P(E|H) is 1 – the cu­mu­lat­ive prob­ab­il­ity func­tion at this num­ber of hours.

This is im­ple­men­ted on a spread­sheet with a tab for each test res­ult. It up­dates the prior prob­ab­il­ity into a pos­terior prob­ab­il­ity and then uses this as the prior for the next tab. The res­ult is nor­m­al­ised to give total prob­ab­il­ity of 1.

Ini­tially I cal­cu­late the ex­pec­ted life of the worst com­pon­ent in 1,000,000. For this, I just use the in­verse cu­mu­lat­ive prob­ab­il­ity func­tion with p=0.000001 and cal­cu­late this for all of the po­ten­tial prob­ab­il­ity dis­tri­bu­tions.

The res­ults of this cal­cu­la­tion are mul­ti­plied by the fi­nal prob­ab­il­it­ies of each dis­tri­bu­tion be­ing the cor­rect one. Then I sum this over the en­tire hy­po­thesis space to give the ex­pec­ted life of the worst com­pon­ent in a mil­lion.

So my first ques­tion – is all of the above sound or have I made some silly mis­take in my lo­gic?


The part that I’m less con­fid­ent about is how to ana­lyse my 95% con­fid­ence level of the life of the worst com­pon­ent in 1,000,000.

The ob­vi­ous way to ap­proach this is that I should just cal­cu­late my ex­pec­ted value for the worst com­pon­ent in 20,000,000. Then, for any given mil­lion that I se­lect, I have a 5% chance of se­lect­ing the worst in 20,000,000. This is treat­ing 95% as my con­fid­ence from the over­all weighted model.

Al­tern­at­ively, I can treat the 95% as re­fer­ring to my con­fid­ence in which is the one cor­rect prob­ab­il­ity dis­tri­bu­tion. In this case, after I nor­m­al­ise my prob­ab­il­it­ies so that the sum of all of the hy­po­theses is 1, I start de­let­ing the least likely hy­po­theses. I keep de­let­ing un­likely hy­po­theses un­til the sum of all of the re­main­ing hy­po­theses is <0.95. The last hy­po­thesis which was de­leted is the top end of my 95% con­fid­ence level.

Now if I cal­cu­late the ex­pec­ted life of the worst com­pon­ent in 1,000,000 for that in­di­vidual model I think I can ar­gue that this also rep­res­ents my 95% con­fid­ence level of the worst com­pon­ent in 1,000,000 but in a dif­fer­ent way.

Is either of these bet­ter than the other? Is there an al­tern­at­ive defin­i­tion of con­fid­ence level which I should be us­ing?

The con­fid­ence-in-which-dis­tri­bu­tion ver­sion gives much more con­ser­vat­ive an­swers, par­tic­u­larly when the num­ber of tests is small; the con­fid­ence-from-over­all-model is much more for­giv­ing of hav­ing fewer tests. Even after only a couple of tests the lat­ter gives a 95% con­fid­ence level re­l­at­ively close to the ex­pec­ted value, whereas the con­fid­ence-in-which-dis­tri­bu­tion ver­sion re­mains fur­ther away from the ex­pec­ted value un­til a lar­ger num­ber of tests is per­formed.

This seems to me to be more real­istic but I don’t have a proper ar­gu­ment to ac­tu­ally jus­tify a de­cision either way.

Any help warmly wel­comed.