Do you have examples of systems that reach this kind of reliabilty internally?
Most high-9 systems work by taking lots of low-9 components, and relying on not all of them failing at the same time. I.e. if you have 10 95% systems that fail completely independently, and you only need one of them to work, that gets you like eleven nines (99.9{11}%).
Expecting a person to be 99% reliable is ridiculous. That’s like two sick days per year, ignoring all other possible causes of failing to make a task. Instead you should build systems and organisations that have slack, so that one person failing at a particular point in time doesn’t make a project/org fail.
Well, in general, I’d say achieving that reliability through redundant means is totally reasonable, whether in engineering or people-based systems.
At a component level? Lots of structural components, for example. Airplane wings stay attached at fairly high reliability, and my impression is that while there is plenty of margin in the strength of the attachment, it’s not like the underlying bolts are being replaced because they failed with any regularity.
I remember an aerospace discussion about a component (a pressure switch, I think?). NASA wanted documentation for 6 9s of reliability, and expected some sort of very careful fault tree analysis and testing plan. The contractor instead used an automotive component (brake system, I think?), and produced documentation of field reliability at a level high enough to meet the requirements. Definitely an example where working to get the underlying component that reliable was probably better than building complex redundancy on top of an unreliable component.
Do you have examples of systems that reach this kind of reliabilty internally?
Most high-9 systems work by taking lots of low-9 components, and relying on not all of them failing at the same time. I.e. if you have 10 95% systems that fail completely independently, and you only need one of them to work, that gets you like eleven nines (99.9{11}%).
Expecting a person to be 99% reliable is ridiculous. That’s like two sick days per year, ignoring all other possible causes of failing to make a task. Instead you should build systems and organisations that have slack, so that one person failing at a particular point in time doesn’t make a project/org fail.
Well, in general, I’d say achieving that reliability through redundant means is totally reasonable, whether in engineering or people-based systems.
At a component level? Lots of structural components, for example. Airplane wings stay attached at fairly high reliability, and my impression is that while there is plenty of margin in the strength of the attachment, it’s not like the underlying bolts are being replaced because they failed with any regularity.
I remember an aerospace discussion about a component (a pressure switch, I think?). NASA wanted documentation for 6 9s of reliability, and expected some sort of very careful fault tree analysis and testing plan. The contractor instead used an automotive component (brake system, I think?), and produced documentation of field reliability at a level high enough to meet the requirements. Definitely an example where working to get the underlying component that reliable was probably better than building complex redundancy on top of an unreliable component.