Sunday, April 19, 2015

linux - Estimating compressed file size using a list parameter

I am currently compressing a list of files from a directory in the following format:



tar -cvjf test_1.tar.gz -T test_1.lst --no-recursion



The above command will compress only those files mentioned in the list. I am doing this because this list is generated such that it fits a DVD. However, during compression the compression rate decreases the estimated file size and there is abundant space left in the DVD. This is something like a Knapsack algorithm.


I would like to estimate the compressed file size and add some more files to the list. I found that it is possible to estimate file size using the following command:



tar -cjf - Folder/ | wc -c



This command does not take a list parameter. Is there a way to estimate compressed file size? I am also looking into options like perl scripts etc.


Edit:


I think I should provide more information since I have been doing a lot of web search. I came across a perl script(Link)that sort of emulates the Knapsack algorithm.


The current problem with the above mentioned script is that it splits the files in their original state. When I compress the files after splitting them, there are opportunities for adding more files which I consider to be inefficient.


There are 2 ways I could resolve the inefficiency:


a) Compress individual files and save them in a directory using a script. The compressed file could provide a best estimate. I could generate a script using a folder of compressed files and use them on the uncompressed ones.


b) Check whether the compressed file's size is less than the required size. If so, I should keep adding files until I meet the requirement. However, the addition of new files to the compressed file is an optimization problem by itself.

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...