Welcome to the Monastery | |
PerlMonks |
Re: Secure Site Login with Mechanizeby naikonta (Curate) |
on May 22, 2007 at 04:43 UTC ( [id://616684]=note: print w/replies, xml ) | Need Help?? |
Hi lazibowel
Scraping on login pages is not always straightforward as we want, and mostly tricky. And it's different from one site to another. I think there are four things to be considered:
Update: There might be also the fifth thing: the old trick of URL referer (HTTP_REFERER). Sometimes it bites until we know that some of the process check for HTTP_REFERER header. I just remembered this one but actually never considered it as much as I did years ago. So you need to closely watch every transaction in detail between the site and the browser before conding the emulation. One way is to use LiveHTTPHeaders extension for FireFox/Mozilla. Another way is to use HTTP::Recorder but I failed with this one though the docs is very straightforward and never tried it again. There are some nodes discussing this module you might want to inspect. After some googling, I then found Web Scraping Proxy (WSP), along with the article that talks about it. Well, this is not easy actually because I still needed to construct my own final scraper program based on skeleton produced by WSP. I had to remove some unwanted and unrelevant transactions (such image fetching), and I had to examine the returned pages (we can control amount of pages to produce). But, WSP did help me to decide what requests to emulate. Below is the stripped-version of my final program that logs in to the target site (my network provider company website) and fetches quota information. Note that at some point the site returns a page contains a JavaScript code which in turn contains a URL needed to be fetched in the next sequence. So in summary, my script fetches four different URLs and parse two returned pages to do the job.
Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!
In Section
Seekers of Perl Wisdom
|
|