Tuesday, November 18, 2014

hard drive - What does SMART testing do and how does it work?



man smartctl states (SNIPPED for brevity):




The first category, called "online" testing. The second category of testing is called "offline" testing. Normally, the disk will suspend offline testing while disk accesses are taking place, and then automatically resume it when the disk would otherwise be idle. The third category of testing (and the only category for which the word ´testing´ is really an appropriate choice) is "self" testing.




Enables or disables SMART automatic offline test, which scans the drive every four hours for disk defects. This command can be given during normal system operation.




Who runs the test - drive firmware? What sort of tests are these - does the firmware read/write to disk - what exactly goes on? Is it safe to invoke testing whilst in the OS (linux) or can one schedule a test for later - how does this take place - when you reboot the OS at the BIOS prompt ('offline test')? Where are the results displayed - SMART logs?


Answer




  1. The drive firmware runs the tests.


  2. The details of the tests can be read in eg www.t13.org/Documents/UploadedDocuments/technical/e01137r0.pdf, which summarises the elements of the short and long tests thus:





    1. an electrical segment wherein the drive tests its own electronics. The particular tests in this segment
      are vendor specific, but as examples: this segment might include such tests as a buffer RAM test, a
      read/write circuitry test, and/or a test of the read/write head elements.


    2. a seek/servo segment wherein the drive tests it capability to find and servo on data tracks. The
      particular methodology used in this test is also vendor specific.


    3. a read/verify scan segment wherein the drive performs read scanning of some portion of the disk
      surface. The amount and location of the surface scanned are dependent on the completion time
      constraint and are vendor specific.


    4. The criteria for the extended self-test are the same as the short self-test with two exceptions: segment

      (3) of the extended self-test shall be a read/verify scan of all of the user data area, and there is no
      maximum time limit for the drive to perform the test.



  3. It is safe to perform non-destructive testing while the OS is running, though some performance impact is likely. As the smartctl man page says for both -t short and -t long,





This command can be given in normal system operation (unless run in captive mode)





If you invoke captive mode with -C, smartctl assumes the drive can be busied-out to unavailability. This should not be done on a drive the OS is using.



As the man page also suggests, the offline testing (which simply means periodic background testing) is not reliable, and never officially became part of the ATA specifications. I run mine from cron, instead; that way I know when they should happen, and I can stop it if I need to.




  1. The results can be seen in the smartctl output. Here's one with a test running:




[root@risby images]# smartctl -a /dev/sdb

smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.1.6-201.fc22.x86_64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
[...]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 20567 -
# 2 Extended offline Completed without error 00% 486 -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Self_test_in_progress [90% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing


Note two previous completed tests (at 486 and 20567 hours power-on, respectively) and the current running one (10% complete).


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...