Wednesday, September 3, 2014

Command-line HTTP crawler for Windows?






Would somebody have a recommendation for a web site crawler that can be invoked and equipped with settings from the command line?


This would need to run in a Windows environment.


Saving the data, following stylesheet links etc. is not an issue. I only need the crawler to start with a page, parse it, and follow all the links on the same domain so that in the end, all pages on the site have been requested once.


Background: I'm setting up a web site that gets frequently uploaded from an office location. Combining data from various sources, it has several levels of caching. I don't want the first user to visit the site after a fresh upload to have to wait until the page has been generated and saved in the cache.


Answer



wget --mirror


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...