I am trying to mirror a fairly large site (20,000+ pages) prior to a major overhaul. Basically, I need a backup before cutting over to the new one in case we forgot something we need (we'll have about 1,000 pages at launch). The site is run on a CMS that I cannot easily extract usable data from, so I'm trying to make the copy with wget.
My problem is that wget does not appear to be actually converting links, despite the presence of --convert-links or -k in the command. I've tried a couple of different combinations of flags, but I haven't been able to get the output I need. Most recent failed attempt was:
nohup wget --mirror -k -l10 -PafscSnapshot --html-extension -R *calendar* -o wget.log http://www.example.org &
I've also included the --backup-converted, and --convert-links instead of -k (not that it have mattered). I've done it with and without -P and -l, again no that they should matter.
Results in files that still have links like:
http://www.example.org/ht/d/sp/i/17770
No comments:
Post a Comment