Sunday, October 12, 2014

windows - How to navigate PHP-based pagination with WGET?

I'm trying to come up with a list of possible names for our next baby boy and have been looking through the site IndiaParenting. Each name on the site has a detail page at a URL like this one http://www.indiaparenting.com/babynames/meaning-of-Aadesh.shtml. I'd like the name to be similar to our firstborn and am trying to do the following:



  • WGET all pages from the site which contains "meaning-of" to a single folder on my hard drive, with something like wget -nc -c -nd -r -l1 -k http://www.indiaparenting.com/babynames/hindu-boy-names.php -A "meaning-of*" -I /babynames

  • Do something like dir > filenames.txt to put everything into a single text file.

  • Parse the generated file for a specific regex to find possible names. First son's name is Ranveer, and we're looking for names that start with either N or R, so the regex is probably something like: [NR][aeiou][^aeiou][^aeiou][aeiou]{2}[^aeiou].

  • Manually go through final list with the Madam & choose a name!


The problem I'm having is with the wget. The page is created with PHP and at the bottom there is a page navigator which doesn't link to another page's URL like normal:


Paging section


I looked and found the JS pagingFunction:


function pagingFunction(labelName){
vpage = document.getElementById("pageNum");
pageNm = labelName;
vpage.value = pageNm;
document.getElementById("frmPaging").submit();
}

Question: I had thought that recursively WGETting the pages would go page by page, but it does not. Is there a way to handle this with WGET? If not, is there another option?




Other information: I thought about generating a list based on the regex, but it would be too long and have too many invalid names anyway, which is why I'd like to base it off of actual names from one of these baby names sites. I am also going to contact the site to see if they are able to just run a query on their DB and put the names in a file for me, and if all else fails, there are other sites to check out.

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...