Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hello Monks, I wanted to write a basic how-to on using WWW::Mechanize that was suggested in Tutorial Quest. I will provide a basic over-view of how to log in to a website. One DON'T that I will say right off the bat to save future frustration is that WWW::Mechanize DOES NOT SUPPORT JAVASCRIPT. One of my first tasks at my job was to write a crawler that logged into a website and downloaded some account information. I will provide that portion here. Some other tools will make working with Mechanize much easier. These would be Firebug (or some other web page inspector) and HTTP Live Headers. For this project, I really only needed Firebug. You will need this to inspect what the names and values of particular parts of the website you are trying to access. One can also set the agent_alias to several different things. In this example, I did not set it. But you can do so like: $mech->agent_alias($alias);.
use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $url = "https://homepage.com"; $mech->get($url); $mech->follow_link( url => 'https://account.login.page.com'); if ($mech->success()){ print "Successful Connection\n"; } else { print "Not a successful connection\n"; }
You will notice here that I just made an if statement to verify if the event was successful. There is a $mech->success function which is very useful for knowing if it went through OK. It is good practice from what I have learned so far to give yourself some kind of verification that what you did worked. This can also be done by putting:
print $mech->content;
or
$mech->dump_text;
The mech->dump_* functions are very useful for debugging or finding out more things about the page you have accessed last. Use them frequently. There is a dump_forms, dump_text, dump_links, etc.. The next part I had to do was enter username/password, start/end date for the report I wanted to receive. I did it with the following:
#This block of code is intended to fill in the required forms $mech->g +et("https://account.login.page.asp"); my $usr = "username"; my $pw = "password"; $mech->form_number(1); $mech->field( "capsn", $usr); $mech->form_number(2); $mech->field("capsp", $pw); $mech->form_number(3); $mech->field( "startdate", $start_date); $mech->form_number(4); $mech->field( "enddate", $end_date); $mech->click();
Here I had to inspect the page with Firebug and find the name of each of the fields (in quotes in my script) and set their value to the variable I declared. The 'click' method did not need the button name specified, though you may have to do that some times. Yes, this site used SSL, and no, I did not need to do anything special to login to it this time. However, I have had to crawl another website using SSL, which I did need to do something special with. This is what I had to do:
use WWW::Mechanize; use IO::Socket::SSL qw(); my $mech = WWW::Mechanize->new(ssl_opts => { SSL_verify_mode => IO::So +cket::SSL::SSL_VERIFY_NONE, verify_hostname => 0,});
In this method, I set it to not verify SSL. Actually, the start and end dates were acquired with a little bit more work using a different module, DateTime. I can get into that later. Newbies to this module should keep in mind that Mechanize DOES NOT interpret javascript. The only way around this that I have found so far is to use HTTP Live Headers to inspect what the HTTP is doing as you navigate through the site. Where there is GET, use $mech->get($url) Where there is a POST, use $mech->post('$url') I have successfully navigated a javascript heavy web page using this method, but it is extremely tedious. If you have a CHOICE, use WWW::Mechanize::Firefox, WWW::Selenium, or some other module that interprets javascript.

In reply to WWW::Mechanize Basics by PerlSufi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-04-19 10:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found