Wednesday, May 10, 2017

raid1 - Rebuilding Raid-arrays



How I can rebuild raid arrays?
I am using Raid 1.
My datacentar says it needs to be fixed, first i thought it is HDDs faulty because of smartmoontools scanning result but it is not.



command:



cat /proc/mdstat




output:



Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdb1[1] sda1[0]
2096064 blocks [2/2] [UU]

md1 : active raid1 sda2[0]
524224 blocks [2/1] [U_]


md2 : active raid1 sda3[0]
729952192 blocks [2/1] [U_]

unused devices:


Does I need to:



# mdadm /dev/md1 -r /dev/sdb2
# mdadm /dev/md2 -r /dev/sdb3

# mdadm /dev/md3 -r /dev/sdb4


and then



# mdadm /dev/md1 -a /dev/sdb2
# mdadm /dev/md2 -a /dev/sdb3
# mdadm /dev/md3 -a /dev/sdb4



Will I lose data or my server will be offline?



Here is the output for fdisk -l



Disk /dev/sda: 750.1 GB, 750156374016 bytes
64 heads, 32 sectors/track, 715404 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/sda1 2 2048 2096128 fd Linux raid autodetect

/dev/sda2 2049 2560 524288 fd Linux raid autodetect
/dev/sda3 2561 715404 729952256 fd Linux raid autodetect

Disk /dev/sdb: 750.1 GB, 750156374016 bytes
64 heads, 32 sectors/track, 715404 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 2 2048 2096128 fd Linux raid autodetect
/dev/sdb2 2049 2560 524288 fd Linux raid autodetect

/dev/sdb3 2561 715404 729952256 fd Linux raid autodetect

Disk /dev/md2: 747.4 GB, 747471044608 bytes
2 heads, 4 sectors/track, 182488048 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/md1: 536 MB, 536805376 bytes
2 heads, 4 sectors/track, 131056 cylinders

Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md0: 2146 MB, 2146369536 bytes
2 heads, 4 sectors/track, 524016 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table



Here is output for smartctl -A /dev/sdb



=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 111 100 006 Pre-fail Always - 38042073
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 7

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 073 060 030 Pre-fail Always - 24494887
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 7935
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 7
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0

190 Airflow_Temperature_Cel 0x0022 062 052 045 Old_age Always - 38 (Min/Max 34/41)
194 Temperature_Celsius 0x0022 038 048 000 Old_age Always - 38 (0 26 0 0 0)
195 Hardware_ECC_Recovered 0x001a 032 026 000 Old_age Always - 38042073
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 101494372179726
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 3317006641
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2924590852


Answer



That drive sdb looks like it's not far from failing. Though it hasn't officially failed yet, it doesn't have much life left in it.



195 Hardware_ECC_Recovered  0x001a   032   026   000    Old_age   Always       -       38042073


This drive has had a large number of recoverable read errors. Which means it successfully reconstructed the data using error correction. However, it's getting to the point where it is most likely it will soon have an unrecoverable read error, where it cannot successfully reconstruct data on a damaged or failing section of the disk. At that point there's nothing you can do and you'll have to replace the drive.



If your rebuild keeps stopping, at the same place, it's entirely possible the drive has already failed at that point on the platters, and isn't reporting it. Desktop class drives will stop and try for minutes or even hours to read a particular sector if they fail the first time, which leads to this sort of thing. And you probably have such a drive in this "server"...




At this point you should have that drive proactively replaced, since it is going to fail soon, if not already.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...