Tuesday, June 27, 2017

linux - zram filesystem reports different usage at device level to that reported by the filesystem



We have an 80GB zram device defined on our host, and within this a 170GB ext4 filesystem:



  echo 170G > /sys/block/zram0/disksize

echo 80G > /sys/block/zram0/mem_limit
/usr/sbin/mkfs.ext4 -q -m 0 /dev/zram0
/usr/bin/mount /dev/zram0 /var/zram


This filesystem is used by our application for rapidly accessing large amounts of ephemeral data.



The filesystem size displayed in df matches the zram size as reported in /sys/block/zram0/disksize



Copying test data into an empty filesystem, we verified that a 2.2 : 1 compression ratio is achieved, and so the filesystem fills before we hit the zramfs memory limit. The /sys/block/zram0/orig_data_size value matches the usage reported by the filesystem:




# expr `cat orig_data_size` / 1024 ; df -k /dev/zram0
112779188
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/zram0 175329308 112956600 62356324 65% /var/zram


However, when the application is running with live data over a longer period, we find that this no longer matches.



# expr `cat orig_data_size` / 1024 ; df -k /dev/zram0

173130200
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/zram0 175329308 112999496 62313428 65% /var/zram


Now, the filesystem reports a usage of approx 110GB, but the zramfs device reports 165GB. At the same time, the zramfs memory is exhausted, and the filesystem becomes read-only.



The zram figures confirm that we are getting a 2.2 : 1 compression ratio between orig_data_size and compr_data_size; however, why does the filesystem show much more free space than the zram device? Even if this is space already allocated for re-use by the filesystem, shouldn't it be reused rather than allocating new space?



The data consists of a large number of small files which are added and removed at irregular intervals.



Answer



The cause of this turns out to be that when files are deleted from the ext4 filesystem living in the zram0 device, the memory is not freed back to the system. This means that, although the space is available for the filesystem to use (the output of df) it is nevertheless still allocated memory ( the stats in /sys/block/zram0 ). As a result, the memory usage heads up to 100% of allocation, though the filesystem still finds itself half-full due to deletions.



This does mean that you can still fill the filesystem and new files will not use as much new memory space; however it does affect the compression ratio negatively.



The solution is to mount the filesystem with the discard and noatime options. The discard releases freed filespace back to the memory device and as a result the usage on the two matches again.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...