×
INTELLIGENT WORK FORUMS
FOR ENGINEERING PROFESSIONALS

Log In

Come Join Us!

Are you an
Engineering professional?
Join Eng-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Eng-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

MTBF Degradation of CPU's After Sustained Overtemp
3

MTBF Degradation of CPU's After Sustained Overtemp

MTBF Degradation of CPU's After Sustained Overtemp

(OP)
I am seeking reliability White Papers or other relevant info on MTBF calculations or "seat-of-the-pants" assessment of long term degradation of microprocessors used in common PC servers after sustained ambient overtemperature conditions.  

Background:  An operating server room's HVAC system failed over a weekend.  Recorded ambient and CPU temps approached 80-100 degrees C.  After the HVAC was restored, all systems appeared to function normally.  

Question:  For a given life expectancy of uP's, is it possible to assess or calculate a percentage loss-of-life reliability quotient?

Thanks, Cliff Michael
cliffmichael@netscape.net
918 625-1563
Replies continue below

Recommended for you

RE: MTBF Degradation of CPU's After Sustained Overtemp

2
48 to 72 hours at a moderately high temperature like that won't have a significant impact on a processor's reliability, but it will have some. For instance, "Intel’s internal goal is that the failure rates of systems in service be less than 1% cumulative at 7 years and less than 3% cumulative at 10 years" (see http://www.intel.com/design/PACKTECH/letter.htm). Now, I think that's pretty aggressive, when you consider how many tens of millions of processors they ship per year. While I'd have to spend a couple moments to research what current thinking is for activation energies of current uP technologies, if we just assume that an aggressive "rule of thumb" for acceleration of thermally activated failure mechanisms is 10x for every 10C above some base temperature, and we assume that base ambient is 50C (not unlikely for a CPU's local ambient in a server), then even the 100C temp (I'm assuming that's an ambient - not die temp. If it's die temp, "who cares" is the answer) results in only 3 days * 10 * 5 or 150 days reduction in life.

Now, we're playing with statistics here, and calculating actual impact on a single device is of course impossible (and even on your room full of devices). Given that a server's useful life is probably under 5 years, this 5 month impact is probably not much to worry about since, depending on supplier, the design life is 7 to 20 years...

Other things besides the processor are far more likely to show degraded performance due to that overtemp condition, such as electrolytic capacitors. So watch your power supplies...

RE: MTBF Degradation of CPU's After Sustained Overtemp

Oops - for accuracy sake, let me correct this. If the acceleration factor is 10x per 10 degC, then a 50 degC increase in temp would result in a 10^5 (100k) acceleration factor, not 10x5. Fortunately the more regularly accepted and utilized acceleration factor "rule of thumb" is more like 2x for every 10 degC...that results in a more reasonable 2^5, or 32x, acceleration factor...or about 96 days...still inconsequential. That's why processors should be able to withstand 1000 hours of lifetest at 125 degC with zero fallout.

Sorry for any confusion.

Mike

RE: MTBF Degradation of CPU's After Sustained Overtemp

Hi, I've been walking around with a question in my head and after seeing this thread thought to ask it here. Apart from abnormal conditions - high temperature etc..- does electronic equipment age?? Or is aging just the accumulated effect of say overtemps, spikes, surges etc.. Is 'ageing' and 'degradation' the same thing

RE: MTBF Degradation of CPU's After Sustained Overtemp

In short, electronics does age independent of overages.  

Classic failure rate analysis is governed by the Arhenius equation.  Thus, all non-catastrophic overages represent accelerations of the basic failure rate, hence, burn-in can be used to accelerate failures and/or to predict what the failure rate will be for given conditions.

TTFN

RE: MTBF Degradation of CPU's After Sustained Overtemp

Electronic equipment does age. There is generally a shelf life associated with each component that makes up the equipment. Most notable are electrolytic capacitors, which tend to dry up with time.

RE: MTBF Degradation of CPU's After Sustained Overtemp

I may cause a hornet's nest to be disturbed here, but over the last five years (maybe more) there has come to be a general belief that temperature alone does not affect the reliabiliy of semiconductors as much as was once thought, especially not VLSI.  Also, surface mount components seem to be more reliable than their through hole counterparts.

The real killers for electronics reliability is temperature cycling and vibration.  These create mechanical failures in areas we don't usually pay enough attention too, soldering and plated through holes to inner layers for example.

RE: MTBF Degradation of CPU's After Sustained Overtemp

sreid, no hornet's nest at all. It's my experience that mechanical failure is the first order of magnitude problem with most assemblies...MIL-HDBK-217 and Bellcore based reliability calculations/estimates reflect that fact. It's not "electronics reliability", though, that's excessively negatively affected by temp cycle/vibration, it's mechanical reliability. So that's one reason good design practice might include part count reduction and integration....fewer solder joints.

Reliability of semiconductor devices is affected by both thermal and electric field. Hot electrons, for instance, are only mildly affected by temperature, but raise the voltage and watch out! Punch-through in short-channel devices (and similar problems like leakage) are voltage-driven; that's the main reason VDD keeps going down in new generations of devices....they'd fail in a second operating at 5V!

Mike

--
Mike Kirschner
Design Chain Associates, LLC
http://www.designchainassociates.com

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Eng-Tips Forums free from inappropriate posts.
The Eng-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Eng-Tips forums is a member-only feature.

Click Here to join Eng-Tips and talk with other members! Already a Member? Login



News


Close Box

Join Eng-Tips® Today!

Join your peers on the Internet's largest technical engineering professional community.
It's easy to join and it's free.

Here's Why Members Love Eng-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close