Tuesday, August 29, 2017

linux - Recovering ZFS pool with errors on import

I have a machine that had some trouble with some bad RAM. After I diagnosed it and removed the offending stick of RAM, The ZFS pool in the machine was trying to access drives by using incorrect device names. I simply exported the pool and re-imported it to correct this. However I am now getting this error.



The pool Storage no longer automatically mounts



sqeaky@sqeaky-media-server:/$ sudo zpool status
no pools available



A regular import says its corrupt



sqeaky@sqeaky-media-server:/$ sudo zpool import
pool: Storage
id: 13247750448079582452
state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or data.
config:


Storage UNAVAIL insufficient replicas
raidz1 UNAVAIL corrupted data
805066522130738790 ONLINE
sdd3 ONLINE
sda3 ONLINE
sdc ONLINE


A specific import says the vdev configuration is invalid




sqeaky@sqeaky-media-server:/$ sudo zpool import Storage
cannot import 'Storage': invalid vdev configuration


Cannot offline or detach the drive because the pool cannot be started/imported



sqeaky@sqeaky-media-server:/$ sudo zpool offline Storage 805066522130738790
cannot open 'Storage': no such pool
sqeaky@sqeaky-media-server:/$ sudo zpool detach Storage 805066522130738790
cannot open 'Storage': no such pool



Cannot force the import



sqeaky@sqeaky-media-server:/$ sudo zpool import -f Storage 
cannot import 'Storage': invalid vdev configuration


I should have 4 devices in my ZFS pool:





/dev/sda3
/dev/sdd3
/dev/sdc
/dev/sdb




I have no clue what 805066522130738790 is but I plan on investigating further. I am also trying to figure out how to use zdb to get more information about what the pool thinks is going on.



For reference This was setup this way, because at the time this machine/pool was setup it needed certain Linux features and booting from ZFS wasn't yet supported in Linux. The partitions sda1 and sdd1 are in a raid 1 for the operating system and sdd2 and sda2 are in a raid1 for the swap.



Any clue on how to recover this ZFS pool?




Edit - Status update
I figured out what 805066522130738790 is. It is some guid that ZFS was failing to use to identify /dev/sdb. When physically I remove /dev/sdb the pool mounts and comes online. But I still cannot swap out the disks. I guess I will back up the files to external media, then blow away the whole pool because it is too corrupt to continue functioning. I should have just had good backups from the start...

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...