Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Confused about LWP, Mechanize and UserAgent

by Anonymous Monk
on Jun 07, 2006 at 19:29 UTC ( [id://554125]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm confused as to how to access the web pages on a website that I need to parse. I have the username and password and I know I need to use the libraries above (useragent, mechanize, simple), but I'm seeing so many different ways of doing things and everything I'm trying is not working.

If the name of the site is www.mydomain.com and on this page there is a login screen (username, password) what do I need to add to my program so that it 'logs in'. My program grabs certain pages from the site and parses them. What I've been doing is passing the LWP::Simple library url's that look like this:

http://www.mydomain.com/program.cgi?var1=apples&var2=bananas

It's been working great and I'm able to get many of the pages on this site. However, like I mentioned above, some of the pages on this site are 'restricted' and I need to 'login' using my parsing program so that the 'get' command will work.

Can anyone give me a good, simple example of how to login so that I can use the LWP::Simple library on restricted pages? I just want to be able to use LWP::Simple without any access problems.

  • Comment on Confused about LWP, Mechanize and UserAgent

Replies are listed 'Best First'.
Re: Confused about LWP, Mechanize and UserAgent
by Joost (Canon) on Jun 07, 2006 at 19:40 UTC
    You can't do all this with LWP::Simple. LWP::Simple is only intented to be used for simple (heh) requests, but even though it does use LWP::UserAgent behind the scenes there's no access to the more advanced UserAgent features you'll need.

    You really should take a look at WWW::Mechanize. Mechanize is a subclass of UserAgent that's designed to make just this kind of "walking a site" tasks easy. It also handles cookies by default, so for most sites you shouldn't have any problems handling the session/login information (mechanize should just do the right thing).

    Example:

    my $m = WWW::Mechanize->new(); # point to the url of the login form $m->get("http://www.mydomain.com/login_form.html"); # submit form with data specified $m->submit_form( form_name => "login_form", { username => "xxxxxxx", password => "yyyyyyy", } ); # now you are logged in. # do actions that you need to be logged in for $m->get("http://www.mydomain.com/program.cgi?something"); # or $m->follow_link( text => "Click here to perform some action");

    Once you're familiar with WWW::Mechanize, it's a lot easier to automate web applications with it than with LWP::UserAgent, and you can still use (almost) all of the tricks you can do with UserAgent too.

Re: Confused about LWP, Mechanize and UserAgent
by tomfahle (Priest) on Jun 07, 2006 at 19:47 UTC
    Hi, You can't do that with LWP::Simple. See http://search.cpan.org/~gaas/libwww-perl-5.805/lwptut.pod#HTTP_Authentication for a tutorial.
    use strict; use LWP; my $browser = LWP::UserAgent->new; $browser->credentials( 'www.mydomain.com:80', # Look up this realm with your browser, eg. Firefox 'web_server_usage_reports', 'user_plinky' => 'Password_banjo123' ); my $url = 'http://www.mydomain.com/program.cgi?var1=apples&var2=bananas'; my $response = $browser->get($url); #......
    Hope this helps
Re: Confused about LWP, Mechanize and UserAgent
by planetscape (Chancellor) on Jun 09, 2006 at 02:58 UTC

    Realize, too, that you can use a module such as HTTP::Recorder or WWW::Mechanize::Shell to record one (or many) successful manual form submission(s). The output of HTTP::Recorder, for instance, can be "dropped" right into your WWW::Mechanize scripts. This makes the development of such scripts trivially easy.

    HTH

    planetscape

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://554125]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2024-04-19 20:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found