hard disk failures
hard disk failures
(OP)
We have a factory environment, with a lot of electronically controlled machinery.In past two years, there have been eight occassions of Hard disk failures ( typically 4 Gb )in PLC machines. In all cases , the failures can be corelated to Power Trips.
Question : Can this be traced down to poor earth resistance ? What should be earth resistance value in production environment involving a total load of 2000 kW on two transformers ?Is earth resistance of 10 ohms considered too high for such conditions ?
We would like to hear from anyone , on similar experiences.
Question : Can this be traced down to poor earth resistance ? What should be earth resistance value in production environment involving a total load of 2000 kW on two transformers ?Is earth resistance of 10 ohms considered too high for such conditions ?
We would like to hear from anyone , on similar experiences.





RE: hard disk failures
RE: hard disk failures
Perhaps some line conditioning is in order.
Need to investigate whether the damage occurs because of the trip itself or because of the restart.
TTFN
RE: hard disk failures
> vibration - any possibility that the power trips cause some sort of mechanical shock to the drives? Is the environment generally vibrating?
> what is the exact failure mode of the drive? Is it complete non-functionality or damaged sectors?
> Could the PLC's have odd behavior during power trips?
> 4 GB drives are probably 4 yrs old or so? Even assuming only 8 hours of operation per day, they're pushing 12,000 hr on-time.
TTFN
RE: hard disk failures
RE: hard disk failures
Without power-quality instrumentation {portable or fixed} to specifically associate power quality to equipment-performance events, it’s hard to judge. More so than grounding-electrode resistance, potential differences from inadequate bonding of circuits and enclosures is more likely the culprit.
Don’t forget that with high-speed signals though metallic paths, resistive and reactive characteristics of many individual instances of bonding must be understood. Indeed that is rarely a trivial undertaking.
RE: hard disk failures
While it might not matter about grounding resistance if everything is designed and implemented correctly, any ground loop could cause problems during massive transients. Someone should review how the PLC's are connected to tbe machinery and whether there are any sneak paths.
TTFN
RE: hard disk failures
Is there any transient voltage surge suppression (TVSS) at the source panel? If not that would probably be my first recommendation.
RE: hard disk failures
I thank all of you for your suggestions.To add to the inputs,the following may be noted.
All the hard disks which failed have UPS connected to their PCs.
Since these are PLC panels designed by wellknown and established manufacturers, all norms of circuit protection, protective devices, are in place.
There are panel cooling Air Conditioners, which are mounted on the panel doors. The vibrations are not likely to reach the hard disks.
General environment is not vibrating ,as suggested by IRstuff.
The type of failure is Bad media. Track 0 bad etc.Always after power trip.(even when UPS does keep the PCs on ).
Number of hours the hard disks are on , is 24 hours a day, 7 days a week, 300 days an year.The plant iteself is 3 years old.All hard disk are therefore,less than 3 years old.
If surges are damaging the Hard disks, why SMPS are not getting damaged any time ? Nothing happens to PLC cards.Nothing goes wrong with UPS. As suggested by peebee, there are no external surge suppressors at source, however, Power Factor capacitors are at centralised location.
We have started watering the earth pits , as a first step which can be immediately undertaken, at no cost.
Thanks everybody.
RE: hard disk failures
The two possibilities are some sort of head crash, although that shouldn't happen with the 4 GB generation, or some transient problem with the PLC's that cause inadvertent write operations during power glitches.
Older generations of disks required some careful head parking to prevent damage to the disk during uncontrolled powerdown events. If you have access to the disk BIOS or other control of the head movement, you might try to see if you can get the computers to park the heads on the farthest track after each operation.
Likewise, parking the heads will also prevent inadvertent write operations during power trips.
TTFN
RE: hard disk failures
To add further, another observation about the hard disk is as follows. All these hard discs are mounted by the control panel manufacturers, in vertical fashion.We do not know if the HD manufacturer have any norm for not mounting vertically.
Secondly, you have suggested to park the HD after each operation on the hard disk.Can you explain this in more details ? The PCs are used as HMI and are in conversation with PLCs.The software running on the PCs is also doing data trend analysis etc.
Is it possible to park hard disk in situations like above?
RE: hard disk failures
"Parking" is the process of moving the head onto an unused track. Normally that would be 1+the highest track number. Since the head is positioned by a stepper motor, the position will be held regardless of orientation.
Obviously, it's not clear that this will solve your problem, but given the failure mode, it might be a plausible patch.
As indicated earlier, some additional failure analysis needs to occur to determine the degree of damage incurred and to possibly trace back to root cause.
TTFN
RE: hard disk failures
Don't count on your UPS to do much in the way of filtering power. This is a common misperception. A spike at the UPS input will transmit to the UPS output. UPS's generally make for poor power filters, and the output voltage waveform is usually worse than the input waveform.
Again, I'd recommend the installation of TVSS. Even if it's just a cheap power strip with TVSS installed at each of the hard drive plugs. It sounds like you're getting damaging spikes. The easiest way to get rid of those spikes is surge suppression.
RE: hard disk failures
heres a suggestion. Your hard drives are in pcs running a scada system over the plcs. You are recording trend data and diplaying plant plant parameter to operations and engineering departments.
I would suggest you have a very good resolution on your trend data and can read down to say less than 30 secs per sample. I would also suggest that your data tags are resident as disk drive not memory to give you reliabilty and the 4 gig drives were selected to give extra space for recording data.
Now all this is supposition but I have seen it again and again and again. I will suggest if you watch the hard drive indicator it doesnt take a break. Calculate how many read writes you're doing.
I suggest that the drives are being over worked and when the pcs try to reboot they cant find sector zero to initialise your operating system
As a suggest lower your sample rates make more of your tags in the memory drive and backup data on batch \basis.
Hope this helps a bit,
I'm open to any other ideas or suggestions
Regards
Don
RE: hard disk failures
peebee has a valid piont about a lot of ups units. The surge suppressor would be cheap insurance even for other potential troubles
Don
RE: hard disk failures
RE: hard disk failures
Regarding a UPS system isolating noise: it won't. It will attenuate input-side noise to some degree, but it won't isolate it. A UPS is essentially an AC->DC->AC converter. If you get a spike on the input, it will show up on the DC bus as well as the AC output, and can possibly damage the UPS and downstream equipment. Each conversion stage will attenuate the noise to some degree but won't eliminate it. Compared to the cost of the computer equipment and the facility cost, the price of a TVSS is negligable.
RE: hard disk failures
RE: hard disk failures
http://www.inventgineering.com/
RE: hard disk failures
The PC power supplies are fully isolated from the mains this reduces damage prospects to a drive dramatically. Other connections via serial connections might couple in noise that could damage a motherboard but NOT the drives who live an isolated life. All of a drive signals are generated and terminated on the mother board with the exception of the isolated power. If you were having voltage spikes that actually damaged the drives you would have mother boards blowing like cheap fuses!
I have built and used and trouble shot literally hundreds of pcs used in industrial control, superviory command and data acquisition (SCADA) and trending applications. Without exception historical and trending SOFTWARE was always the problem in drive corruption situations. If the data is not cached and then occasionally written to the drive but instead written constantly to the drive disaster is almost certain. All that is required is a missed write, a missed interrupt, a missed DMA process, a loss of a bit, a stray cosmic ray, an unrelated running software bug... the list goes on and on, and you will get lost indexes, blown FAT tables and damaged boot records. This is because the data writing process is a little hazardous. Each chunk of data being written to the drive requires modifications to drive tables, indexes and the like. If the drive is updating one of these and there is any hiccup, pointers to where things are can be scrambled. Once scrambled computers are masters at rapidly compounding the disaster.
Proper data logging requires sending data to the drives as infrequenty as possible. If the data is of critical nature,(really seriously critical), then it must be written at different times to different drives preferably on different computers. If it is written occassionally the external opportunities for corruption are dramatically reduced, sometimes orders of magnitude.
What I'm saying is the problem you're having is software NOT hardware. If you kicked the plug out of your computer every time it finished booting you would not lose your drives, even though this could be concidered a large power disturbance. I suggest that a brief system disturbance occurs that in all likelihood would not cause a detected problem but because the data logging is on-going logging data gets written to the boot sector or FAT table resulting in Track 0 bad ect.
That said, do also make sure you have MOV's (metal oxide varistors) or TVSS on all supply lines to all computers in an industrial facility. ALWAYS! They do work and they do cost virtually nothing! They should always run line to neutral, line to ground and neutral to ground.
Get your software fixed and always practice safe hex!