I would like to run a SMART offline test on one of my hard disks (internal SATA). The machine is running Ubuntu 14.04, so I simply do smartctl -t offline /dev/sdb
(as root). It starts the test and gives me an estimated time of completion.
The drive in question holds the system's root fs, so it's being actively (but not heavily) used. So when I later run smartctl -a /dev/sdb
, even well after the estimated time, I see "Offline data collection status: (0x04) Offline data collection activity
was suspended by an interrupting command from host." It's not clear if the test is ever going to finish.
My understanding is that the offline test essentially checks every sector on the disk to see if it can be read. When the computer accesses the disk, the test is suspended and resumed after the command finishes. But it seems that there are enough commands being sent that very little time is spent on the test, so it progresses extremely slowly or not at all. (I also wonder if there is an intentional delay between the completion of the command and the resumption of the test, to avoid switching back and forth too frequently.)
Is there any way to somehow prioritize the SMART test higher, so that it makes progress at a reasonable rate, while still keeping the disk accessible? It would be fine if disk access is slower, as long as the system can still run. I know about captive mode -C
, but this would make the system unusable while the test runs. I could of course boot from another disk and run the SMART test with sdb
unmounted, but that also would make the system effectively unusable for the duration (and it requires physical access to the machine, which happens to be inconvenient).
I saw SMART-Test never finishes, but that seems to be the opposite problem: the disk has no activity and enters standby mode. I see no evidence of that being the case here, as the disk is active. I also saw SMART short offline test never ends for all drives of a RAID1 on ServerFault, but the answer suggests controller or cabling problems, which I have no reason to suspect as the drive is generally working fine.
(As an aside: is there any way to check the progress of an offline test? That would give me some idea as to whether the test has any chance of completing in a reasonable amount of time.)
No comments:
Post a Comment