Wednesday, July 6, 2016

linux - Copying a large directory tree locally? cp or rsync?



I have to copy a large directory tree, about 1.8 TB. It's all local. Out of habit I'd use rsync, however I wonder if there's much point, and if I should rather use cp.



I'm worried about permissions and uid/gid, since they have to be preserved in the copy (I know rsync does this). As well as things like symlinks.



The destination is empty, so I don't have to worry about conditionally updating some files. It's all local disk, so I don't have to worry about ssh or network.



The reason I'd be tempted away from rsync, is because rsync might do more than I need. rsync checksums files. I don't need that, and am concerned that it might take longer than cp.




So what do you reckon, rsync or cp?


Answer



I would use rsync as it means that if it is interrupted for any reason, then you can restart it easily with very little cost. And being rsync, it can even restart part way through a large file. As others mention, it can exclude files easily. The simplest way to preserve most things is to use the -a flag – ‘archive.’ So:



rsync -a source dest


Although UID/GID and symlinks are preserved by -a (see -lpgo), your question implies you might want a full copy of the filesystem information; and -a doesn't include hard-links, extended attributes, or ACLs (on Linux) or the above nor resource forks (on OS X.) Thus, for a robust copy of a filesystem, you'll need to include those flags:




rsync -aHAX source dest # Linux
rsync -aHE source dest # OS X


The default cp will start again, though the -u flag will "copy only when the SOURCE file is newer than the destination file or when the destination file is missing". And the -a (archive) flag will be recursive, not recopy files if you have to restart and preserve permissions. So:



cp -au source dest

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...