Grid MTBF or Failure Rate
Grid MTBF or Failure Rate
(OP)
We have a grid of 20 computers where each data is written to at least two computers. A failure or loss of data would occur when two servers fail at the same time. If each computer has an MTBF of 45k hours how do I across calculating the system MTBF where a system failure only occurs when 2 or more nodes fail at the same time?
We can assume 0 time to repair, because within less than a minute the backup data becomes primary and the primary creates a backup across the remaining members of the grid.
Thanks, Gary
We can assume 0 time to repair, because within less than a minute the backup data becomes primary and the primary creates a backup across the remaining members of the grid.
Thanks, Gary
RE: Grid MTBF or Failure Rate
RE: Grid MTBF or Failure Rate
**********************
"Pumping accounts for 20% of the world's energy used by electric motors and 25% to 50% of the total electrical energy usage in certain industrial facilities." - DOE statistic (Note: Make that 99.99% for pipeline companies) http://virtualpipeline.spaces.live.com/
RE: Grid MTBF or Failure Rate
I'm really only concerned with losing 2 computers at the same exact time.
RE: Grid MTBF or Failure Rate
There are 20C2 or 190 different possible computer/computer pairings for a specific data set at a given time. Of the 190 pairings, your definition of success is only interested in 1 of those pairings. Based on that, I would say the reliability is officially "very high".
To put some more meat on this, if you assume the computer failure rates follow an exponential distribution (not sure about how accurate it is for computers), then the reliability of 1 computer is exp(-t/MTBF).
Since exponential distribution is continuous, the probability of the second failing at the same time approaches 0. Lets be conservative and assume its 100th of the first probability. This implies a higher reliability for the second computer.
They would operate similar to a parallel fashion creating a 2 computer system reliability of: 1-(1-exp(-t/MTBF))(1-exp(-t/MTBF)).
Apply this probability to the binomial distribution to see what the probability of occurrence is:20c2*(1-(1-exp(-t/MTBF)(1-exp(-t/MTBF)))^1*(1-exp(-t/MTBF)(1-exp(-t/MTBF)^189. This quickly approaches 0.
I would never expect to see this failure mode occur in anyone here's lifetime based strictly on the computer's reliability performance. Best of luck!