I didn’t realize, but am not surprised by these assists in predicting MTBF.
My main point is that the data is extrapolated from a shorter time frame and a finite number of tests to predict the actual failure frequency in real life and that these predictive methods involve complicated statistical techniques.
While there may be thousands of drives manufactured, the MTBF is usually published before they are made or subjected to any real world use. The high numbers will simply serve to reduce the confidence interval in predicting the actual MTBF.
Tests like the time-temp acceleration factor are in themselves based on a statistical correlation between the increase in failures and the increase in temperatures.
Calculating assembly failures from component failures is also a statistical technique. A simplified example is if I have two components and both have to fail for the assembly to fail and the probability of failure is 50% for each then the probability of assembly failure is 25%.
Even if a extremely large number of drives was tested or you had the complete storm data for some location, it would be a statistical exercise to predict failure rates.
The predicted storm frequency or drive failure rates could also be totally nullified by some change in the underlying conditions like climate change that had an effect on rainfall or a new motherboard that had some effect on drive reliability.
Any time you use a statistic you should ask a couple of questions. (First you have to realize that the number you are using is a statistical based number.)
What is the sample size? Is the 1:100 year storm based on 30 or 200 years of data?
How reliable is the data? Are all data points relating to the same measurement? Did they use a different type of rain gauge 150 years ago?
Has some underlying factor changed that makes the data worthless as a predictor of future events? I.e. global warming
What is the confidence interval at some statistical level of assurance? Is the range x to y and you are 95% sure that the true value lies in this range?
How important is the potential variance in the analysis to what you are doing? I can live with a wide range of storm rainfalls without any significant impact. Often a storm sewer designed for the 5 year rainfall will also accommodate the 10 or 20 year rainfall because the 5 year number forces me to use a pipe size that will carry larger flows. In this case I really don’t care about the confidence interval. If the pipe size is just big enough for the 5 year flow then I might be concerned if the actual flow exceeds the predicted 5 year flows.
If you are doing any regression analysis how strong is the correlation between the independent and the dependant variable? In a weak correlation the effect of a third variable may nullify all conclusions.
Is there some logical reason for a correlation to exist? Someone once did a correlation analysis between the length of Vanna White’s hemlines on Wheel of Fortune (a US game show for those outside North America) and the next day’s performance in the stock market. The cause and effect here is hard to see so this could simply be a statistical anomaly.
Rick Kitson MBA P.Eng
Construction Project Management
From conception to completion