Monday, November 17, 2014

bash - directory with millions of file: memory efficient way to list files (linux / ext4)



Unfortunately I have an application, that put millions of files in one flat directory (without any sub directories)



If I perform an ls or a find on the given directory, then ls or find consume serveral Gigabytes of RAM.



I guess, the reason is, that ls and find read all files of one directory into RAM.




My question is.



Is there any way to list the files of this directory without consuming so much memory?



Any solution (special options / different commands / C program to compile / a special python module) would be interesting.


Answer



There is:



The ls command not only reads the file names, but also fstat()s every file. If you use the opendir() / readdir() / close() sequence you will do much better.




In addition to that, the resources needed to list a directory are also a function of the file system: XFS uses much less than ext4.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...