Monday, March 16, 2015

zfsonlinux - ZFS copy on write

I am planning to use ZFS for backups. 5-10 servers will "stream" updates via DRBD to very large files (500 gigabytes each) on ZFS file system.



The servers will generate about 20 megabytes per second of random writes about 100 MBps total. I don't read these files so the pattern should be almost 100% writes.




For me copy on write is a very important feature.



As i understand COW should transform random writes to sequential writes. But this is not happening.



I tested on a server with 12 SAS drives E5520 XEON (4 core) and 24 GB RAM and random write was very low.



I decided to debug it first on 1 SAS HDD on the same server.



I created EXT4 file system and did some tests:




 
root@zfs:/mnt/hdd/test# dd if=/dev/zero of=tempfile bs=1M count=4096 conv=fdatasync,notrunc
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 30.2524 s, 142 MB/s


So I can see write speed is about 140 MBps.




Random writes ~ 500 KBps ~100-150 iops. Which is normal.




fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=1 --size=4G --readwrite=randwrite
test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.1.11
Starting 1 process
bs: 1 (f=1): [w(1)] [0.6% done] [0KB/548KB/0KB /s] [0/137/0 iops] [eta 02h:02m:57s]



Then on the same drive I created ZFS:



zpool create -f -m /mnt/data bigdata scsi-35000cca01b370274



I set record size 4K because I will have 4K random writes. Record size 4K worked better than 128k when I was testing.



zfs set recordsize=4k bigdata



Tested random writes to 4G files.





fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=./test --filename=test --bs=4k --iodepth=1 --size=4G --readwrite=randwrite
./test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.1.11
Starting 1 process
./test: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/115.9MB/0KB /s] [0/29.7K/0 iops] [
[eta 00m:00s]



Looks like COW did well here 115.9MB per sec.



Tested random writes to 16G files.




fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=./test16G --bs=4k --iodepth=1 --size=16G --readwrite=randwrite

test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.1.11
Starting 1 process

bs: 1 (f=1): [w(1)] [0.1% done] [0KB/404KB/0KB /s] [0/101/0 iops] [eta 02h:08m:55s]


Very poor results 400 kilobytes per second.



Tried the same with 8G files:




fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=./test8G --bs=4k --iodepth=1 --size=8G --readwrite=randwrite


test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.1.11
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 8192MB)

bs: 1 (f=1): [w(1)] [14.5% done] [0KB/158.3MB/0KB /s] [0/40.6K/0 iops] [eta 00m:53s]


At the beginning COW was fine 136 megabytes per second.





Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 1120.00 0.00 136.65 249.88 9.53 8.51 0.00 8.51 0.89 99.24


But at the end when test reached 90% write speed went down to 5 megabyte per second.




Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 805.90 0.00 5.33 13.54 9.95 12.34 0.00 12.34 1.24 100.00


So 4G files are fine, 8G almost fine but 16G files are not getting any COW.



Don't understand what is happening here. Maybe memory caching plays role here.



OS: Debian 8
ZFS ver 500.

No compression or deduplication.





zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
bigdata 1.81T 64.4G 1.75T - 2% 3% 1.00x ONLINE -


root@zfs:/mnt/data/test# zdb

bigdata:
version: 5000
name: 'bigdata'
state: 0
txg: 4
pool_guid: 16909063556944448541
errata: 0
hostid: 8323329
hostname: 'zfs'
vdev_children: 1

vdev_tree:
type: 'root'
id: 0
guid: 16909063556944448541
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 8547466691663463210
path: '/dev/disk/by-id/scsi-35000cca01b370274-part1'

whole_disk: 1
metaslab_array: 34
metaslab_shift: 34
ashift: 9
asize: 2000384688128
is_log: 0
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data




zpool status bigdata
pool: bigdata
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM

bigdata ONLINE 0 0 0
scsi-35000cca01b370274 ONLINE 0 0 0
errors: No known data errors


fio doesn't work with O_DIRECT on ZFS I had to run without it. As I understand it should produce even better results. But it is not happening.




fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=./test --filename=test16G --bs=4k --iodepth=1 --size=16G --readwrite=randwrite
./test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1

fio-2.1.11
Starting 1 process
fio: looks like your file system does not support direct=1/buffered=0
fio: destination does not support O_DIRECT

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...