Monday, December 22, 2014

performance - Optimal ARC and L2ARC settings for purpose specific storage application

I am configuring a server that runs 3 ZFS pools, 2 of which are rather purpose specific and I feel like the default recommendations are simply not optimized for them. Networking is facilitated by dual 10gbit adapters.




Pool 1 is a big file storage, it contains raw video data that is rarely written and read, and also occasional backups. There is absolutely no point in caching anything from that pool, as it is high bandwidth data that is read through in one sweep beginning to end, caching anything from it will be a complete waste of memory. Latency is not that much of an issue, and bandwidth is ample due to highly compressible data. The pool is made of 8 HDDs in z2 mode, usable capacity of 24TB.



Pool 2 is compressed video frames storage. Portions of this content are frequently read when compositing video projects. The portion of frequently used data is usually higher than the total amount of RAM the server has, there is a low latency requirement, but not ultra low, bandwidth is more important. The pool is made of 3 HDDs in z1, usable capacity of 8TB, and a 1TB NVME SSD for L2ARC.



Pool 3 is general storage used as storage for several computer systems that boot and run software from it rather than local storage. Since it has to service several machines and primary system storage, the requirements for latency and bandwidth here are the highest. This pool is mostly read from, writes are limited to what the client systems do. The pool is made of 3 SATA SSDs in z1 mode, 1TB of usable capacity.



My intent at optimization has to do with minimizing the ARC size for the first two pools in order to maximize the ARC size for the third one.



Pool 1 has no benefit from caching whatsoever, so what's the minimum safe amount of ARC I can set for it?




Pool 2 can benefit from ARC but it is not really worth it, as L2ARC is fast enough for the purpose and the drive has 1 TB of capacity. Ideally, I would be happy if I could get away without using any ARC for this volume, and using the full terabyte of L2ARC, but it seems that at least some ARC is needed for L2ARC header data.



So considering L2ARC capacity of 1 TB and pool record size of 64k, 1tb / 64kb * 70b gives me ~0.995gb. Does this mean I can safely cap the ARC for that pool at 1GB? Or maybe it needs more?



It seems that the ARC contains both read cache as well as the information to handle the L2ARC, so it looks like what I need is some option to give emphasis to managing a larger L2ARC than bother with caching actual data in RAM. And if necessary, mandate that any cache evictions from ARC are moved to L2ARC in the event cache eviction policies do not abide to usual caching hierarchy policies.



The general recommendations I've read suggest about 1GB of RAM per 1TB of storage, I am planning 32GB of RAM per 33 TB of storage which I am almost dead on, but 4 or 5 to 1 for L2ARC vs ARC, which I fall short of by quite a lot. The goal is to cut pool 1 ARC as low as possible, and cut pool 2 ARC to only as much as it needs in order to be able to utilize the whole 1TB of L2ARC, in order to maximize the RAM available for ARC for pool 3.

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...