Thursday, February 5, 2015

Make wget convert HTML links to relative after download if -k wasn't specified


The -k option (or --convert-link) will convert links in your web pages to relative after the download finishes, such as the man page says:



After the download is complete,
convert the links in the document to
make them suitable for local viewing.
This affects not only the visible
hyperlinks, but any part of the
document that links to external
content, such as embedded images,
links to style sheets, hyperlinks to
non-HTML content, etc.



So, if I didn't specify -k, can I run wget again after the download and fix that, and if so, what would be the proper command? My guess is wget -c [previous options used] [url] and run it in the same working directory as the file were downloaded to.


Answer



Yes, you can make wget do it. I'd say use wget -nc -k [previous options] [previous url]. -nc is no-clobber. From the man page:



When −nc is specified, this behavior is
suppressed, and Wget will refuse to
download newer copies of file.



And the -k option does the link converting. So, wget starts digging in the remote server, sees all the files you already have, refuses to redownload them, and then edits the HTML links to relative when it's done. Nice.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...