Sunday, January 29, 2017

linux - How to delete millions of files without disturbing the server




I'd like to delete an nginx cache directory, which I quickly purged by:



mv cache cache.bak
mkdir cache
service nginx restart


Now I have a cache.bak folder which has 2 million files. I'd like to delete it, without disturbing the server.



A simple rm -rf cache.bak trashes the server, even the simplest HTTP response takes 16 seconds while rm is running, so I cannot do that.




I tried ionice -c3 rm -rf cache.bak, but it didn't help. The server has an HDD, not an SSD, probably on an SSD these might not be a problem.



I believe the best solution would be some kind of throttling, like how nginx's built in cache manager does.



How would you solve this? Is there any tool which can do exactly this?



ext4 on Ubuntu 16.04


Answer



I got many useful answers / comments here, which I'd like to conclude as well as show my solution as well.





  1. Yes, the best way to prevent such thing happening is to keep the cache dir on a separate filesystem. Nuking / quick formatting a file system always takes a few seconds (maybe minutes) at most, unrelated to how many files / dirs were present on it.


  2. The ionice / nice solutions didn't do anything, because the deleting process actually caused almost no I/O. What caused the I/O was I believe kernel / filesystem level queues / buffers filling up when files were deleted too quickly by the delete process.


  3. The way I solved it is similar to Tero Kilkanen's solution, but didn't require calling a shell script. I used rsync's built in --bwlimit switch to limit the speed of deleting.




Full command was:



mkdir empty_dir

rsync -v -a --delete --bwlimit=1 empty_dir/ cache.bak/


Now bwlimit specifies bandwidth in kilobyes, which in this case applied to the filename or path of the files. By setting it to 1 KBps, it was deleting around 100,000 files per hour, or 27 files per second. Files had relative paths like cache.bak/e/c1/db98339573acc5c76bdac4a601f9ec1e, which is 47 characters long, so it would give 1000/47 ~= 21 files per second, so kind of similar to my guess of 100,000 files per hour.



Now why --bwlimit=1? I tried various values:




  • 10000, 1000, 100 -> system slowing down like before

  • 10 -> system working quite well for a while, but produces partial slowdowns once a minute or so. HTTP response times still < 1 sec.


  • 1 -> no system slowdown at all. I'm not in a hurry and 2 million files can be deleted in < 1 day this way, so I choose it.



I like the simplicity of rsync's built in method, but this solution depends on the relative path's length. Not a big problem as most people would find the right value via trial and error.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...