Saturday, December 19, 2015

linux - Slow performance due to txg_sync for ZFS 0.6.3 on Ubuntu 14.04




I am using native ZFS with "ZFS on Linux" installed from the PPA here. Setup was not a problem and I am using it in mirrored configuration with two WD 4TB Red HDDs. Unfortunately I am having performance issues, when writing to the disk-array. When reading performance is OK.



I am having the problem, that during large writes to the array, the copy process stalls to ~5-10MB/s every ~5 seconds as reported by rsync. The speeds in-between stalls is ~75MB/s, which is inline with other filesystems and what I would expect from the system (I tried btrfs, which gets ~85MB/s). Looking at iotop I have found that the copy-stalls coincide with the process txg_sync performing/hogging I/O. This issue appears to be the issue of "bursty" I/O that seems to be a common issue with ZFS (see here and here). I have applied the option from the first link



options zfs zfs_prefetch_disable=1


which helped somewhat with the performance issues, but did not solve them. The 5s interval of txg_sync appears to be that of vfs.zfs.txg.timeout="5" (e.g. 5s), which is the default setting of ZFS on Linux.




Is this normal behaviour or are there other settings can I try? If so, any suggestions? Note that I couldn't find many of the options in both links...



EDIT 2: To follow up a little: The system I am using is a HP ProLiant Microserver N36L, which I upgraded to 8GB ECC RAM. The commands I used for creating the ZFS volume is given here. Note that I am using -o ashift=12 as I found (found on the zfsonlinux FAQ) that this should get ZFS to play nice with the 4096Byte blocks of Advanced Format Disks.



$ zpool create -o ashift=12 -m /zpools/tank tank mirror ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0871252 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E3PKP1R0
$ zfs set relatime=on tank
$ zfs set compression=lz4 tank
$ zfs create -o casesensitivity=mixed tank/data



Added the zfs_prefetch_disable option to /etc/modprob.d/zfs.conf to make changes permanent:



options zfs zfs_prefetch_disable=1


So that:



$ cat /sys/module/zfs/parameters/zfs_prefetch_disable 
1



EDIT 1: As requested, I added the zpool get all output. Note that I forgot to mention that I turned on compression on the pool...



$ zpool get all
NAME PROPERTY VALUE SOURCE
tank size 3.62T -
tank capacity 39% -
tank altroot - default
tank health ONLINE -
tank guid 12372923926654962277 default

tank version - default
tank bootfs - default
tank delegation on default
tank autoreplace off default
tank cachefile - default
tank failmode wait default
tank listsnapshots off default
tank autoexpand off default
tank dedupditto 0 default
tank dedupratio 1.00x -

tank free 2.21T -
tank allocated 1.42T -
tank readonly off -
tank ashift 12 local
tank comment - default
tank expandsize 0 -
tank freeing 0 default
tank feature@async_destroy enabled local
tank feature@empty_bpobj active local
tank feature@lz4_compress active local


Answer



Pacoman,
It seems that because you have two two WD-RED drives in a mirror, the IO to write the ZIL consistency group to disk is causing high IO. There is always a ZIL (Write-Cache). If you do not have any LOG devices, then the log device is on the pool itself, and can be as large as maximum write speed * 5 seconds. Your probably reading from the ZIL, and committing the data to permanent storage every 5 seconds. Questions:




  1. Do you have a SLOG device? This is ideally a DRAM Drive (HGST ZeusRAM, etc...).

  2. Do you have any cache devices to read from? Ideally, a bunch of Flash, like a 480GB PCIe card.




My recommendation would be to create a SLOG somewhere other than the pool (even the boot device is better than no where, assuming it NOT flash). This way you aren't reading and writing to the mirror intensively every 5 seconds.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...