Command-line HTTP crawler for Windows?

Wednesday, September 3, 2014

Command-line HTTP crawler for Windows?

Would somebody have a recommendation for a web site crawler that can be invoked and equipped with settings from the command line?

This would need to run in a Windows environment.

Saving the data, following stylesheet links etc. is not an issue. I only need the crawler to start with a page, parse it, and follow all the links on the same domain so that in the end, all pages on the site have been requested once.

Background: I'm setting up a web site that gets frequently uploaded from an office location. Combining data from various sources, it has several levels of caching. I don't want the first user to visit the site after a fresh upload to have to wait until the page has been generated and saved in the cache.

Answer

wget --mirror

Blog

Wednesday, September 3, 2014

Command-line HTTP crawler for Windows?

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server