Tuesday, April 21, 2015

raidz - ZFS Recover from Faulted Pool State



I have a six disk ZFS raidz1 pool and had a recent failure requiring a disk replacement. No problem normally, but this time my server hardware died before I could do the replacement (but after and unrelated to the drive failure as far as I can tell).




I was able to get another machine from a friend to rebuild the system, but in the process of moving my drives over I had to swap their cables around a bunch until I got the right configuration where the remaining 5 good disks were seen as online. This process seems to have generated some checksum errors for the pool/raidz.



I have the 5 remaining drives set up now and a good drive installed and ready to take the place of the drive that died. However, since my pool state is FAULTED I'm unable to do the replacement.



root@zfs:~# zpool replace tank 1298243857915644462 /dev/sdb
cannot open 'tank': pool is unavailable


Is there any way to recover from this error? I would think that having 5 of the 6 drives online would be enough to rebuild the right data, but that doesn't seem to be enough now.




Here's the status log of my pool:



root@zfs:~# zpool status tank
pool: tank
state: FAULTED
status: One or more devices could not be used because the label is missing or invalid.
There are insufficient replicas for the pool to continue functioning.
action: Destroy and re-create the pool from a backup source.
see: http://zfsonlinux.org/msg/ZFS-8000-5E
scan: none requested

config:

NAME STATE READ WRITE CKSUM
tank FAULTED 0 0 1 corrupted data
raidz1-0 ONLINE 0 0 8
sdd ONLINE 0 0 0
sdf ONLINE 0 0 0
sdh ONLINE 0 0 0
1298243857915644462 UNAVAIL 0 0 0 was /dev/sdb1
sde ONLINE 0 0 0

sdg ONLINE 0 0 0


Update (10/31): I tried to export and re-import the array a few times over the past week and wasn't successful. First I tried:



zpool import -f -R /tank -N -o readonly=on -F tank


That produced this error immediately:




cannot import 'tank': I/O error
Destroy and re-create the pool from a backup source.


I added the '-X' option to the above command to try to make it check the transaction log. I let that run for about 48 hours before giving up because it had completely locked up my machine (I was unable to log in locally or via the network).



Now I'm trying a simple zpool import tank command and that seems to run for a while with no output. I'll leave it running overnight to see if it outputs anything.



Update (11/1): zpool import tank has been running for about 12 hours now with no command line output so far. However, my computer is still responsive so that's a plus.


Answer




Basicly there is no official way to recover other than restore from backup.
But there is ZFS feature called rewind, that may be possible to remove transactions
from the pool to a point that the pool is functional again.
The following text is from ZFS Internals blog part #11




DO NOT TRY IT IN PRODUCTION. USE AT YOUR OWN RISK!



zpool import -FX mypool where options mean:
* -F Attempt rewind if necessary.
* -X Turn on extreme rewind.
* -T Specify a starting txg to use for import. This option is intentionally undocumented option for testing purposes.





First I tried to recover using this rewind procedure. It didn't work for me, maybe it is not implemented on zfs-fuse for Linux.
According to ZFSOnDiskFormat.pdf, there is array with 128 possible values for txg.
In my zfs-fuse version 0.7.0 option -T don't exist. So I modified zfs-fuse to list available txg in uberblock array and to allow starting from txg with a specific Id. Using modified zfs-fuse I was able to access filesystems in ZFS.



I did recover my pool by using this method. So it is possible to recover, but it is unsupported method and has to be done very careful, as it is pretty easy to mess things even worse.
My opinion is Sun/Oracle should provide fsck for ZFS for these situations.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...