Tuesday, May 31, 2016

smartctl - e2fsck found no errors, but S.M.A.R.T. self test fails



I have an external Freecom HDD (Samsung drive inside), connected via USB and using it's own power supply.



The disk disconnects itself on random interval of time (from few hours up to a month). I tend to blame the operating system because the same drive had no problems working connected to a USB port of TP-Link router.



Anyway, just to be sure I performed extended SMART self test using smartctl and completed with Completed: read failure 30% message.
So, I performed an additional test using e2fsck. It took me a whole night to perform the test on this 1.5TB drive. The test completed with no errors at all.



I am pretty confused - should I trust on SMART self-test or on e2feck results? Also, the SMART health status is ' PASSED' and short self test is fine, too.

The usual suspects are checked - the USB cable has been replaced with new one and the external power is checked.
Ideas?
Should I buy new drive or I am safe? Is SMART or e2fsck a more reliable source of health status?


Answer



The SMART result means that the hard drive is failing, it is very likely to fail completely, soon, and you should retire it as a matter of urgency. The fact that e2fsck returns no errors means that the incipient failures have not yet corrupted your data (or, to be more precise, have not yet corrupted the file system which houses your data: e2fsck doesn't check every bit of the data).



You may find, when you copy all the data off that drive - which you should do today - that you can read all the data. This means that the blocks which have so far failed and are unreadable do not hold any of the data; they are just unallocated blocks. The emptier the FS, and the fewer the failures, the more likely you are to get away with it.



You may also find that the copying tool fails on reading one or more blocks which make up a file. If this happens, you'll have to shrug, and regard that file as corrupted. You'll also need to use a tool that is tolerant of block read errors and won't just stop dead when it hits the first one. I prefer dumpe2fs, but I'm an ancient relic.




However you slice and dice it, the famous google paper is clear: smartctl errors are a strong predictor of imminent failure. Get your data off that drive today, and if at all possible, get it out of service. And if it turns out you get it all, consider buying a lottery ticket: you're a lucky person!


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...