Sunday, October 25, 2015

linux - Storing many small files with xattr in XFS and ext4



I have many small files (20 millions) with xattr on XFS drives. The average size of files is around 20KB and the average size of xattr is about 512 bytes.



Because I would like to move them to ext4, I tested to copy a part of these files into ext4 (with inode size 512 and 1024) and XFS drives.



I formatted 3 drives with the following options respectively:



# mkfs.ext4 -i 8192 -I 512 /dev/sde1
# mkfs.ext4 -i 8192 -I 1024 /dev/sdf1

# mkfs.xfs -f -i size=1024 /dev/sdg1


And copied about 30GB small files by cp -r --preserve=mode,ownership,timestamps,xattr



The result is as follows:



Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sde1 133654640 37164884 89360916 30% /srv/node/sde1 # ext4 -I 512
/dev/sdf1 124741744 36645652 80967252 32% /srv/node/sdf1 # ext4 -I 1024

/dev/sdg1 142507224 31020968 111486256 22% /srv/node/sdg1 # XFS


ext4 uses much more blocks than XFS in both cases. What makes these difference? How can I use ext4 not wasting disk space?


Answer



One thing is that XFS doesn't perform well on millions of small sizes.
XFS is not inode based FS.
Your block sizes will differ as you have lots of small files, ext4 holds in data blocks group also indirect block maps, extent tree blocks, and extended attributes, inode lists e.t.c. which increase total block size for the same amount of files. Also journaling takes some.
For ext4 you could remove journaling if you want, but recovery of fs time will increase significantly, also you could allocate less % for superuser using -m option (default is 5%) you could put up 1%.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...