raid1 - Rebuilding Raid-arrays

Wednesday, May 10, 2017

raid1 - Rebuilding Raid-arrays

How I can rebuild raid arrays?
I am using Raid 1.
My datacentar says it needs to be fixed, first i thought it is HDDs faulty because of smartmoontools scanning result but it is not.

command:

cat /proc/mdstat

output:

Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdb1[1] sda1[0]
      2096064 blocks [2/2] [UU]

md1 : active raid1 sda2[0]
      524224 blocks [2/1] [U_]


md2 : active raid1 sda3[0]
      729952192 blocks [2/1] [U_]

unused devices:

Does I need to:

# mdadm /dev/md1 -r /dev/sdb2
# mdadm /dev/md2 -r /dev/sdb3

# mdadm /dev/md3 -r /dev/sdb4

and then

# mdadm /dev/md1 -a /dev/sdb2
# mdadm /dev/md2 -a /dev/sdb3
# mdadm /dev/md3 -a /dev/sdb4

Will I lose data or my server will be offline?

Here is the output for fdisk -l

Disk /dev/sda: 750.1 GB, 750156374016 bytes
64 heads, 32 sectors/track, 715404 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               2        2048     2096128   fd  Linux raid autodetect

/dev/sda2            2049        2560      524288   fd  Linux raid autodetect
/dev/sda3            2561      715404   729952256   fd  Linux raid autodetect

Disk /dev/sdb: 750.1 GB, 750156374016 bytes
64 heads, 32 sectors/track, 715404 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               2        2048     2096128   fd  Linux raid autodetect
/dev/sdb2            2049        2560      524288   fd  Linux raid autodetect

/dev/sdb3            2561      715404   729952256   fd  Linux raid autodetect

Disk /dev/md2: 747.4 GB, 747471044608 bytes
2 heads, 4 sectors/track, 182488048 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/md1: 536 MB, 536805376 bytes
2 heads, 4 sectors/track, 131056 cylinders

Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md0: 2146 MB, 2146369536 bytes
2 heads, 4 sectors/track, 524016 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Here is output for smartctl -A /dev/sdb

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   111   100   006    Pre-fail  Always       -       38042073
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       7

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   073   060   030    Pre-fail  Always       -       24494887
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       7935
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       7
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       4
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   062   052   045    Old_age   Always       -       38 (Min/Max 34/41)
194 Temperature_Celsius     0x0022   038   048   000    Old_age   Always       -       38 (0 26 0 0 0)
195 Hardware_ECC_Recovered  0x001a   032   026   000    Old_age   Always       -       38042073
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       101494372179726
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3317006641
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       2924590852

Answer

That drive sdb looks like it's not far from failing. Though it hasn't officially failed yet, it doesn't have much life left in it.

195 Hardware_ECC_Recovered  0x001a   032   026   000    Old_age   Always       -       38042073

This drive has had a large number of recoverable read errors. Which means it successfully reconstructed the data using error correction. However, it's getting to the point where it is most likely it will soon have an unrecoverable read error, where it cannot successfully reconstruct data on a damaged or failing section of the disk. At that point there's nothing you can do and you'll have to replace the drive.

If your rebuild keeps stopping, at the same place, it's entirely possible the drive has already failed at that point on the platters, and isn't reporting it. Desktop class drives will stop and try for minutes or even hours to read a particular sector if they fail the first time, which leads to this sort of thing. And you probably have such a drive in this "server"...

At this point you should have that drive proactively replaced, since it is going to fail soon, if not already.

Blog

Wednesday, May 10, 2017

raid1 - Rebuilding Raid-arrays

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server