Tuesday, February 23, 2016

linux - ZFS RAID0 pool without redundancy




I created a ZFS pool on Ubuntu 14.04 without specifiying RAID or redundancy options, wrote some data to it, rebooted the machine and the pool is no longer available (UNAVAIL). I don't have the exact error to hand but it mentioned that there was not sufficient replication available. I created two datastores in the pool which consists of 2 3TB disks. ZFS was recommended to me for its deduplication abilities and I'm not concerned with redundancy at this point.



I actually only want RAID0 so no mirroring or redundancy in the short term. Is there a way to do this with ZFS or would I be better off with LVM?



zpool status -v:

sudo zpool status -v
pool: cryptoporticus
state: UNAVAIL
status: One or more devices could not be used because the label is missing

or invalid. There are insufficient replicas for the pool to continue
functioning.
action: Destroy and re-create the pool from
a backup source.
see: http://zfsonlinux.org/msg/ZFS-8000-5E
scan: none requested
config:

NAME STATE READ WRITE CKSUM
cryptoporticus UNAVAIL 0 0 0 insufficient replicas

sda ONLINE 0 0 0
sdc UNAVAIL 0 0 0


UPDATE



zpool export cyrptoporticus, then zpool import cryptoporticus resolved this for now. Is this likely to happen again on reboot?


Answer



You likely are seeing a situation where at least one of your used disks became unavailable. This might be intermittent and resolvable, both Linux implementations (ZFS on Linux as well as zfs-fuse) seem to exhibit occasional hiccups which are easily cured by a zpool clear or a zpool export / zpool import cycle.




As for your question, yes, ZFS is perfectly capable of creating and maintaining a pool without any redundancy just by issuing something like zpool create mypool sdb sdc sdd.



But personally, I would not use ZFS just for its deduplication capabilities. Due to its architecture, ZFS deduplication will require a large amount of RAM and plenty of disk I/O for write operations. You probably will find it unsuitable for pools as large as yours as writes will be getting painfully slow. If you need deduplication, you might want to look at offline dedup implementations with a smaller memory and I/O footprint like btrfs file-level batch deduplication using bedup or block-level deduplication using dupremove: https://btrfs.wiki.kernel.org/index.php/Deduplication


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...