Thursday, August 18, 2016

linux - Best filesystem choices for NFS storing VMware disk images



Currently we use an iSCSI SAN as storage for several VMware ESXi servers. I am investigating the use of an NFS target on a Linux server for additional virtual machines. I am also open to the idea of using an alternative operating system (like OpenSolaris) if it will provide significant advantages.



What Linux-based filesystem favours very large contiguous files (like VMware's disk images)? Alternatively, how have people found ZFS on OpenSolaris for this kind of workload?



(This question was originally asked on SuperUser; feel free to migrate answers here if you know how).



Answer



I'd really recommend you take a look at ZFS, but to get decent performance, you're going to need to pick up a dedicated device as a ZFS Intent Log (ZIL). Basically this is a small device (a few GB) that can write extremely fast (20-100K IOPS) which lets ZFS immediately confirm that writes have been synced to storage, but wait up to 30secs to actually commit the writes to the hard disks in your pool. In the event of crash/outage any uncommitted transaction in the ZIL are replayed upon mount. As a result, in addition to a UPS you may want a drive with an internal power supply/super-capacitor so that any pending IOs make it to permanent storage in the event of a power loss. If you opt against a dedicated ZIL device, writes can can have high latency leading to all sorts of problems. Assuming you're not interested in Sun's 18GB write optimized SSD "Logzilla" at ~$8200, some cheaper alternatives exist:




  • DDRDrive X1 - 4GB DDR2 + 4GB SLC Flash in a PCIe x1 card designed explicitly for ZIL use. Writes go to RAM; in the event of power loss, it syncs RAM to NAND in <60sec powered by a supercapacitor. (50k-300k IOPS; $2000 Direct, $1500 for .edu)

  • Intel X25-E 32GB 2.5inch SSD (SLC, but no super cap, 3300 write IOPS); [$390 @ Amazon][11]

  • OCZ Vertex 2 Pro 40GB 2.5inch SSD (supercap, but MLC, 20k-50k write IOPS); $435 @ Amazon.



Once you've got OpenSolaris/Nexenta + ZFS setup there are quite a few ways to move blocks between your OpenSolaris and ESX boxen; what's right for you heavily depends on your existing infrastructure (L3 switches, Fibre cards) and your priorities (redundancy, latency, speed, cost). But since you don't need specialized licenses to unlock iSCSI/FC/NFS functionality you can evaluate anything you've got hardware for and pick your favorite:





  • iSCSI Targets (CPU overhead; no TOE support in OpenSolaris)

  • Fibre Channel Targets (Fibre Cards ain't cheap)

  • NFS (VMWare + NFS can be finicky, limited to 32 mounts)



If you can't spend $500 for evaluation, test with and without ZIL disabled to see if the ZIL is a bottleneck. (It probably is). Don't do this in production. Don't mess with ZFS deduplication just yet unless you also have lots of ram and an SSD for L2ARC. It's definitely nice once you get it setup, but you definitely try to do some NFS Tuning before playing with dedup. Once you get it saturating a 1-2 Gb links there are growth opportunities in 8gb FC, 10gigE and infiniband, but each require a significant investment even for evaluation.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...