linux - Degraded RAID5 and no md superblock on one of remaining drive

This is actually on a QNAP TS-509 NAS. The RAID is basically a Linux RAID.

The NAS was configured with RAID 5 with 5 drives (/md0 with /dev/sd[abcde]3). At some point, /dev/sde failed and drive was replaced. While rebuilding (and not completed), the NAS rebooted itself and /dev/sdc dropped out of the array. Now the array can't start because essentially 2 drives have dropped out. I disconnected /dev/sde and hoped that /md0 can resume in degraded mode, but no luck.. Further investigation shows that /dev/sdc3 has no md superblock. The data should be good since the array was unable to assemble after /dev/sdc dropped off.

All the searches I done showed how to reassemble the array assuming 1 bad drive. But I think I just need to restore the superblock on /dev/sdc3 and that should bring the array up to a degraded mode which will allow me to backup data and then proceed with rebuilding with adding /dev/sde.

Any help would be greatly appreciated.

mdstat does not show /dev/md0

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md5 : active raid1 sdd2[2](S) sdc2[3](S) sdb2[1] sda2[0]
      530048 blocks [2/2] [UU]
md13 : active raid1 sdd4[3] sdc4[2] sdb4[1] sda4[0]
      458880 blocks [5/4] [UUUU_]
      bitmap: 40/57 pages [160KB], 4KB chunk
md9 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
      530048 blocks [5/4] [UUUU_]
      bitmap: 33/65 pages [132KB], 4KB chunk

mdadm show /dev/md0 is still there

# mdadm --examine --scan
ARRAY /dev/md9 level=raid1 num-devices=5 UUID=271bf0f7:faf1f2c2:967631a4:3c0fa888
ARRAY /dev/md5 level=raid1 num-devices=2 UUID=0d75de26:0759d153:5524b8ea:86a3ee0d
   spares=2
ARRAY /dev/md0 level=raid5 num-devices=5 UUID=ce3e369b:4ff9ddd2:3639798a:e3889841
ARRAY /dev/md13 level=raid1 num-devices=5 UUID=7384c159:ea48a152:a1cdc8f2:c8d79a9c

With /dev/sde removed, here is the mdadm examine output showing sdc3 has no md superblock

# mdadm --examine /dev/sda3
/dev/sda3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : ce3e369b:4ff9ddd2:3639798a:e3889841
  Creation Time : Sat Dec  8 15:01:19 2012
     Raid Level : raid5
  Used Dev Size : 1463569600 (1395.77 GiB 1498.70 GB)
     Array Size : 5854278400 (5583.08 GiB 5994.78 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0
    Update Time : Sat Dec  8 15:06:17 2012
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0
       Checksum : d9e9ff0e - correct
         Events : 0.394
         Layout : left-symmetric
     Chunk Size : 64K
      Number   Major   Minor   RaidDevice State
this     0       8        3        0      active sync   /dev/sda3
   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       8       35        2      active sync   /dev/sdc3
   3     3       8       51        3      active sync   /dev/sdd3
   4     4       0        0        4      faulty removed
[~] # mdadm --examine /dev/sdb3
/dev/sdb3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : ce3e369b:4ff9ddd2:3639798a:e3889841
  Creation Time : Sat Dec  8 15:01:19 2012
     Raid Level : raid5
  Used Dev Size : 1463569600 (1395.77 GiB 1498.70 GB)
     Array Size : 5854278400 (5583.08 GiB 5994.78 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0
    Update Time : Sat Dec  8 15:06:17 2012
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0
       Checksum : d9e9ff20 - correct
         Events : 0.394
         Layout : left-symmetric
     Chunk Size : 64K
      Number   Major   Minor   RaidDevice State
this     1       8       19        1      active sync   /dev/sdb3
   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       8       35        2      active sync   /dev/sdc3
   3     3       8       51        3      active sync   /dev/sdd3
   4     4       0        0        4      faulty removed
[~] # mdadm --examine /dev/sdc3
mdadm: No md superblock detected on /dev/sdc3.
[~] # mdadm --examine /dev/sdd3
/dev/sdd3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : ce3e369b:4ff9ddd2:3639798a:e3889841
  Creation Time : Sat Dec  8 15:01:19 2012
     Raid Level : raid5
  Used Dev Size : 1463569600 (1395.77 GiB 1498.70 GB)
     Array Size : 5854278400 (5583.08 GiB 5994.78 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0
    Update Time : Sat Dec  8 15:06:17 2012
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0
       Checksum : d9e9ff44 - correct
         Events : 0.394
         Layout : left-symmetric
     Chunk Size : 64K
      Number   Major   Minor   RaidDevice State
this     3       8       51        3      active sync   /dev/sdd3
   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       8       35        2      active sync   /dev/sdc3
   3     3       8       51        3      active sync   /dev/sdd3
   4     4       0        0        4      faulty removed

fdisk output shows /dev/sdc3 partition is still there.

[~] # fdisk -l
Disk /dev/sdx: 128 MB, 128057344 bytes
8 heads, 32 sectors/track, 977 cylinders
Units = cylinders of 256 * 512 = 131072 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdx1               1           8        1008   83  Linux
/dev/sdx2               9         440       55296   83  Linux
/dev/sdx3             441         872       55296   83  Linux
/dev/sdx4             873         977       13440    5  Extended
/dev/sdx5             873         913        5232   83  Linux
/dev/sdx6             914         977        8176   83  Linux
Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          66      530113+  83  Linux
/dev/sda2              67         132      530145   82  Linux swap / Solaris
/dev/sda3             133      182338  1463569695   83  Linux
/dev/sda4          182339      182400      498015   83  Linux
Disk /dev/sda4: 469 MB, 469893120 bytes
2 heads, 4 sectors/track, 114720 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/sda4 doesn't contain a valid partition table
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          66      530113+  83  Linux
/dev/sdb2              67         132      530145   82  Linux swap / Solaris
/dev/sdb3             133      182338  1463569695   83  Linux
/dev/sdb4          182339      182400      498015   83  Linux
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1          66      530125   83  Linux
/dev/sdc2              67         132      530142   83  Linux
/dev/sdc3             133      182338  1463569693   83  Linux
/dev/sdc4          182339      182400      498012   83  Linux
Disk /dev/sdd: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1          66      530125   83  Linux
/dev/sdd2              67         132      530142   83  Linux
/dev/sdd3             133      243138  1951945693   83  Linux
/dev/sdd4          243139      243200      498012   83  Linux
Disk /dev/md9: 542 MB, 542769152 bytes
2 heads, 4 sectors/track, 132512 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md9 doesn't contain a valid partition table
Disk /dev/md5: 542 MB, 542769152 bytes
2 heads, 4 sectors/track, 132512 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md5 doesn't contain a valid partition table

Answer

Ouch!

All the searches I done showed how to reassemble the array assuming 1 bad drive.

That is because RAID5 will not work with more than one failed drive. You can not guarantee to recover all data with two missing drives. In fact, if both drives are fully inaccessible recovering it will fail. The data simple is not there anymore.

Two notes:

I wrote fully down. As in dead disk, drive removed from system. Not just a single bad sector.

The usual rant that RAID is not a backup. If RAID fails you just have to keep the system up till 5PM, backups the files changed since the last backup (using incremental backup) and then you can either try a lengthy rebuild or rebuild the RAID and restore from backup. Obviously as a home user you do things slightly different, but the same problem persists when doing a RAID5 rebuild and getting an URE.

( Also see This canonical post Serverfault and this post on S.U. and this post on S.U.
)

In your case I see these options:

Send the drives in to a very expensive data recovery lab. These things are really expensive.

Give up and restore from an old backup.

Try to mount the RAID arrays with two drives missing.

Before you try option 3: Make a backup of the drives. Place them in another system and copy the drives with dd or ddrescue. Keep those images. If things fail you can restore to the current situation from these. (read: things will not get worse).

You then can try to recover either from the NAS, or from the system where you stored the images. Make a working copy of them and use the loopback device. If you have sufficient diskspace then this is the preferred way, though you would need a place with twice the free diskspace of your entire NAS.

Next read this rather lengthy blog at http://blog.al4.co.nz/2011/03/recovering-a-raid5-mdadm-array-with-two-failed-devices/.

The essential steps in it are:
mdadm --create /dev/md1 --level=5 --raid-devices=5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 missing

That would mark drive 5 as missing. I selected that one because I have no idea what state it is in after a partial rebuild.

With a bit of luck you can now mount it as a degraded array. Copy all data off it, then delete the array and rebuild. It may hang during the copying of data. In that case reboot, skip a few files and continue. It is far from perfect, but if recovery is to expensive and you have no backups then this might be the only way.

Blog

Monday, August 17, 2015

linux - Degraded RAID5 and no md superblock on one of remaining drive

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server