Sunday, August 23, 2015

linux - Ubuntu RAID 10 - unable to assemble



Here's a brief history of how I got here:





  1. 4x disk Ubuntu 12.04 software RAID10 with 5x partitions (md0 - md4)

  2. 1x disk died

  3. mdadm --fail > mdadm --remove > physically removed drive and replace

  4. mdadm --add > disk resync'd perfectly for all partitions

  5. decided to replace all the disks so they were identical

  6. repeat steps 3-4 for remaining 3 disks. 2nd and 3rd disk went perfectly.

  7. after final disk was replaced I added it back to the array but was notified that the file system was in read-only mode.

  8. cat /proc/mdstat revealed that some partitions had dropped out but it was very inconsistent.

  9. I rebooted the machine (probably not the smartest idea)

  10. Machine wouldn't boot (no MBR on the new disk I assume).


  11. Replaced last drive I had taken out. Machine boots to intitramfs prompt but keyboard unresponsive.

  12. Remove last drive so now only the 3x good disks remain.

  13. Boot from Ubuntu Live USB.

  14. Ubuntu disk utility lists the 4x RAID devices says they are inactive and partially assembled.

  15. ubuntu@ubuntu:~$ cat /proc/mdstat




    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md3 : inactive sdd8[6](S) sdc8[5](S) sdb8[4](S)
    1464837120 blocks super 1.2


    md4 : inactive sdd9[6](S) sdc9[5](S) sdb9[4](S)
    718365696 blocks super 1.2

    md1 : inactive sdd6[6](S) sdc6[5](S) sdb6[4](S)
    146479104 blocks super 1.2

    md2 : inactive sdd7[6](S) sdc7[5](S) sdb7[4](S)
    585931776 blocks super 1.2


    md0 : inactive sdd5[6](S) sdc5[5](S) sdb5[4](S)
    14641152 blocks super 1.2

    unused devices:


  16. ubuntu@ubuntu:~$ sudo mdadm --assemble --verbose /dev/md0 -f /dev/sdb5 /dev/sdc5 /dev/sdd5






mdadm: looking for devices for /dev/md0 mdadm: cannot open device
/dev/sdb5: Device or resource busy mdadm: /dev/sdb5 has no superblock
- assembly aborted




So now I'm a bit stuck! The 3x disks in there were all consistent at the moment that the 4th disk was replaced. SMART checks come out ok (no bad sectors, etc.).



I just need a way to restore the array with 3x disks so I can re-add the 4th. Any thoughts?



Many thanks!



Answer



I resolved this as follows from an Ubuntu Live USB (had to install mdadm):




  1. mdadm --stop /dev/md[01234]

  2. mdadm --assemble /dev/md0 --verbose /dev/sd[abc]5 (note I used just the 3x good drives).

  3. repeat for each /dev/mdx

  4. if I got a "device or resource busy" error I would --stop that /dev/mdx again and repeat the --assemble. No idea why this worked but it did.

  5. mdadm --manage /dev/mdx --add /dev/sddx for each /dev/mdx and respective /dev/sddx partition

  6. All disks were sync'd in their arrays, mdadm happy.


  7. Followed the instructions here https://help.ubuntu.com/community/Grub2/Installing (mount OS filesystem > chroot > install grub on each drive > update grub > reboot)

  8. ???

  9. Profit. Machine booted, detected filesystem errors on the OS partition but repaired them (the disks probably dropped out at slightly different times). After repairing and a reboot it's all back and running with no data loss.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...