Saturday, November 29, 2014

script - How to add extensions to a lot of files using content of each file?

I've got over 10,000 files that don't have extensions from older versions of the Mac OS. They're extremely nested in file structure, and they also have all sorts of strange formatting and characters. They don't have file types or creator codes attached to them any longer. A lot of these files have text in the file that will let me determine extensions (for example Word.Document.8 is in the text of every file created by that version of Word).


I found a script that looks like it would work for one of these file types at a time, but it erases parts of filenames after nefarious characters, which is not good.


find . -type f -not -name "*.*" -print0 |\
xargs -0 file |\
grep 'Word.Document.8' |\
sed 's/:.*//' |\
xargs -I % echo mv % %.doc

Should I clean the characters in the filenames first, or programmatically deal with those in the script in order to leave them the same? As long as I lose no information from the filenames, I don't see a problem cleaning out slashes and other problem characters. Also, if I clean the filenames, there are likely to be duplicates, so any cleaning script would have to add something like "-1" before the extension to make sure nothing gets lost.


I'm not tied to this script, but it is understandable, which is a pro. Mac OS X 10.6 is installed on this file server, but I've got access to any recent versions of OS X.

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...