linux - Trouble using wget or httrack to mirror archived website

Tuesday, January 20, 2015

linux - Trouble using wget or httrack to mirror archived website

I am trying to use wget to create a local mirror of a website. But I am finding that I am not getting all the linking pages.

Here is the website

http://web.archive.org/web/20110722080716/http://cst-www.nrl.navy.mil/lattice/

I don't want all pages that begin with web.archive.org, but I do want all pages that begin with http://web.archive.org/web/20110722080716/http://cst-www.nrl.navy.mil/lattice/.

When I use wget -r, in my file structure I find

web.archive.org/web/20110722080716/http://cst-www.nrl.navy.mil/lattice/index.html,

but I don't have all files that are part of this database, e.g.

web.archive.org/web/20110808041151/http://cst-www.nrl.navy.mil/lattice/struk/d0c.html.

Perhaps httrack would do better, but right now that's grabbing too much.

So, by which means is it possible to grab a local copy of an archived website from the Internet Archive Wayback Machine?

Blog

Tuesday, January 20, 2015

linux - Trouble using wget or httrack to mirror archived website

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server