Thursday, March 1, 2018

Multiple Enterprise-level Western Digital hard drives failing



In 2007, I built a 2.5TB server with 5 500GB WDC RE2 hard drives in a RAID5 configuration. Since these are Western Digital's Enterprise-level drives, they come with a 5 year warranty, but I have been seeing these drives die much sooner than that. I lost one drive in 2009 and last year I had two drives fail close together. Since I had to rebuild the server from backups, I decided to rebuild it with only 4 drives using RAID6 since we didn't need all that space. All replacement drives have been WDC RE3 drives and have had no failures as of yet. 2 weeks ago, the fourth original RE2 drive failed with a serious hardware malfunction. It sounded very sickly and was completely non-responsive. It did not register on the system so I couldn't run any S.M.A.R.T. tests or even pull off it's serial number. Two days ago the final RE2 drive failed, but was a little less serious of a crash. It had am unrecoverable sector read error and was unable to reallocate it, but otherwise was fully functional. S.M.A.R.T. even claimed the drive was still healthy. This does not seem normal to loose five Enterprise-level drives with a year left to spare on their warranty.




When I lost the two drives last year, I added additional fans to enhance the cooling. Currently, the three responsive drives report their temperatures at 30°C which is well withing their 0-60°C operating range. Someone mentioned that I might have an under-rated power supply. It's a 500W supply, but I am using the old-style molex for all but two of the SATA drives as it only came with two SATA power connectors. Could that be a problem?



Unfortunately, the new RE3 drives only have a SATA power connector so I will have to either get a splitter or a different power supply once I replace this next disk.



Or is it possible I might just have purchased 5 drives from a bad batch?


Answer



I must say that sadly this pretty much backs our experience with WD RE2 drives. Out of our little drive park of 10 drives, We had to replace 6 in the first two years. And some the replacements sent to us by WD failed rather quickly as well. I believe the "bad batch" spans across the entire RE2 series, as none of our other installed drive types show failure numbers even in the same magnitude as the WD RE2 ones.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...