http://qs321.pair.com?node_id=616675

lazybowel has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, im trying to write a script that will login into http://del.icio.us and show what is there. However whenever i try to login through the https://secure.del.icio.us/login it does not work. This is what i have so far
#!/usr/bin/perl -w use strict; use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->agent_alias('Linux Mozilla'); $mech->get("https://secure.del.icio.us/login"); $mech->submit_form( fields => { user_name => 'username', password => 'password', } ); print $mech->content(format=>'text');

I tried this same script on the yahoo mail and it worked they also use the https

Replies are listed 'Best First'.
Re: Secure Site Login with Mechanize
by chargrill (Parson) on May 22, 2007 at 03:03 UTC

    Try one of these: delicious modules on CPAN. In particular, WWW::Scraper::Delicious looks promising.

    (Assuming you just want to see what's there like you said)

    Update: Your script worked just fine for me as is, once I installed Crypt::SSLeay. What was the exact problem you were having?



    --chargrill
    s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
      I have Crypt::SSLeay installed, thats the first i checked for, the problem is that when i run the script it kicks me to the about user page http://del.icio.us/username. It does exactly what it would do if i turned off the cookies in mozilla and tried to log in.
Re: Secure Site Login with Mechanize
by naikonta (Curate) on May 22, 2007 at 04:43 UTC
    Hi lazibowel

    Scraping on login pages is not always straightforward as we want, and mostly tricky. And it's different from one site to another. I think there are four things to be considered:

    • Cookie, but since you use WWW::Mechanize this shouldn't be much problem since the module initializes an empty cookie jar (overrides LWP::UserAgent option).
    • SSL connection, this doesn't seem to be your problem either since you said you have Crypt::SSLeay installed and your script works with Yahoo!
    • Redirection, most sites perform a few redirections to get to the final URL target that does the actual authentication
    • JavaScript, some sites returns page contains JavaScript codes in which it store a URL to be fetched in the next sequence. The site my program is targetting is one such example. Another variation is probably a hidden (i)frame.

    Update: There might be also the fifth thing: the old trick of URL referer (HTTP_REFERER). Sometimes it bites until we know that some of the process check for HTTP_REFERER header. I just remembered this one but actually never considered it as much as I did years ago.

    So you need to closely watch every transaction in detail between the site and the browser before conding the emulation. One way is to use LiveHTTPHeaders extension for FireFox/Mozilla. Another way is to use HTTP::Recorder but I failed with this one though the docs is very straightforward and never tried it again. There are some nodes discussing this module you might want to inspect. After some googling, I then found Web Scraping Proxy (WSP), along with the article that talks about it.

    Well, this is not easy actually because I still needed to construct my own final scraper program based on skeleton produced by WSP. I had to remove some unwanted and unrelevant transactions (such image fetching), and I had to examine the returned pages (we can control amount of pages to produce). But, WSP did help me to decide what requests to emulate. Below is the stripped-version of my final program that logs in to the target site (my network provider company website) and fetches quota information. Note that at some point the site returns a page contains a JavaScript code which in turn contains a URL needed to be fetched in the next sequence. So in summary, my script fetches four different URLs and parse two returned pages to do the job.

    #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use HTML::TokeParser; use File::Basename; use subs 'fire'; my $basedir = $0; $basedir = dirname $basedir; my %auth = (username => 'username', password => 'password'); my $ua = LWP::UserAgent->new(cookie_jar => {}); my($content, $parser); #### INITIAL REQUEST fire get => 'http://www.example.com/mainpage.php'; ### LOGIN fire post => 'https://ip.address/session.php', \%auth; # extract javascript content $parser = HTML::TokeParser->new(\$content); my $next_url; while (my $token = $parser->get_token) { next unless $token->[0] eq 'S' && $token->[1] eq 'script'; $next_url = $token->[2]{src}, last if $token->[2]{src}; } ### URL by JavaScript fire get => $next_url; # get the real content fire get => "https://the.same.ip.address/?"; # final page $parser = HTML::TokeParser->new(\$content); # parse quota info from this page sub fire { my($method, @args) = @_; my $res = $ua->$method(@args); #print STDERR "Checking for $args[0]\n"; if ($res->is_success) { $content = $res->content } else { die $res->status_line . "\n" } }

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

Re: Secure Site Login with Mechanize
by zentara (Archbishop) on May 22, 2007 at 16:19 UTC