I've got over 10,000 files that don't have extensions from older versions of the Mac OS. They're extremely nested in file structure, and they also have all sorts of strange formatting and characters. They don't have file types or creator codes attached to them any longer. A lot of these files have text in the file that will let me determine extensions (for example Word.Document.8 is in the text of every file created by that version of Word).
I found a script that looks like it would work for one of these file types at a time, but it erases parts of filenames after nefarious characters, which is not good.
find . -type f -not -name "*.*" -print0 |\
xargs -0 file |\
grep 'Word.Document.8' |\
sed 's/:.*//' |\
xargs -I % echo mv % %.doc
Should I clean the characters in the filenames first, or programmatically deal with those in the script in order to leave them the same? As long as I lose no information from the filenames, I don't see a problem cleaning out slashes and other problem characters. Also, if I clean the filenames, there are likely to be duplicates, so any cleaning script would have to add something like "-1" before the extension to make sure nothing gets lost.
I'm not tied to this script, but it is understandable, which is a pro. Mac OS X 10.6 is installed on this file server, but I've got access to any recent versions of OS X.
No comments:
Post a Comment