zfs - Mitigating risk of Disk Failure + Some corruption?

Sunday, November 19, 2017

zfs - Mitigating risk of Disk Failure + Some corruption?

So my understanding of one scenario that ZFS addresses is where a RAID5 drive fails, and then during a rebuild it encountered some corrupt blocks of data and thus cannot restore that data. From Googling around I don't see this failure scenario demonstrated; either articles on a disk failure, or articles on healing data corruption, but not both.

1) Is ZFS using 3 drive raidz1 susceptible to this problem? I.e. if one drive is lost, replaced, and data corruption is encountered when reading/rebuilding, then there is no redundancy to repair this data. My understanding is that the corrupted data will be lost, correct?

(I do understand that periodic scrubbing will minimize the risk, but lets assume some tiny amount of corruption occurred on one disk since the last scrubbing, and a different disk also failed, and thus the corruption is detected during the rebuild)

2) Does raidz2 4 drive setup protect against this scenario?

3) Does a two drive mirrored setup with copies=2 would protect against this scenario? I.e. one drive fails, but the other drive contains 2 copies of all data, so if corruption is encountered during rebuild, there is a redundant copy on that disk to restore from?
It's appealing to me because it uses half as many disks as the raidz2 setup, even though I'd need larger disks.

I am not committed to ZFS, but it is what I've read the most about off and on for a couple years now.

It would be really nice if there were something similar to par archive/reed-solomon that generates some amount of parity that protects up to 10% data corruption and only uses an amount of space proportional to how much x% corruption protection you want. Then I'd just use a mirror setup and each disk in the mirror would contain a copy of that parity, which would be relatively small when compared to option #3 above. Unfortunately I don't think reed-solomon fits this scenario very well. I've been reading an old NASA document on implementing reed-solomon(the only comprehensive explanation I could find that didn't require buying a journal articular) and as far as I my understanding goes, the set of parity data would need to be completely regenerated for each incremental change to the source data. I.e. there's not an easy way to do incremental changes to the reed-solomon parity data in response to small incremental changes to the source data. I'm wondering though if there's something similar in concept(proportionally small amount of parity data protecting X% corruption ANYWHERE in the source data) out there that someone is aware of, but I think that's probably a pipe dream.

Blog

Sunday, November 19, 2017

zfs - Mitigating risk of Disk Failure + Some corruption?

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server