http://qs321.pair.com?node_id=723534

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I've written a scraper with WWW::Mechanize.

It logs in to a website every time I run it and returns a table of data.

So, now I'm thinking, why does it have to log in every time?

My browser doesn't have to log in every time. There's a timeout of an hour.

I've got this:

my $mech = WWW::Mechanize->new( cookie_jar => { file => "/path/to/cookies.txt", autosave => 1 } );
in my code, and that file is being written to every time the script runs, but it's empty except for "#LWP-Cookies-1.0" at the top.

There's no "remember me" option on this website, so they're only for-this-session type cookies, but why doesn't Mech save those for me? That way the website would let my script return five minutes later and see it as still logged in, just like it does with my browser.

Maybe there's something about session cookies I don't understand? Any help gratefully received.



Nobody says perl looks like line-noise any more
kids today don't know what line-noise IS ...

Replies are listed 'Best First'.
Re: Stay logged in between requests with WWW::Mechanize
by Anonymous Monk on Nov 14, 2008 at 02:45 UTC
      Anonymous Monk, you rock! It's working. Thank you.


      Nobody says perl looks like line-noise any more
      kids today don't know what line-noise IS ...
Re: Stay logged in between requests with WWW::Mechanize
by oko1 (Deacon) on Nov 14, 2008 at 02:25 UTC

    Assuming the remote server is using cookies rather than session IDs (you can find out by closing your browser and then trying to reconnect without logging in: if you can, then it's cookies; otherwise, it's sessions), you need to not only specify the cookie_jar that you're using but to also load it. I find that setting the cookie_jar and "autosave" in the call to 'new()' causes W::M to spit out errors - so I tend to do it manually. Sample script follows:

    #!/usr/bin/perl -w use strict; use WWW::Mechanize; my $cookie_file = "/tmp/cookies"; my $agent = WWW::Mechanize->new(); if (! -s $cookie_file){ $agent->get("http://okopnik.com/PHP/other/cookies.php"); # Save the cookie from this session $agent->cookie_jar->save($cookie_file); } else { $agent->cookie_jar->load($cookie_file); $agent->get("http://okopnik.com/PHP/other/cookies.php"); } print $agent->content;

    Assuming that you start with an empty "/tmp/cookies", this will populate it with a cookie the first time you run it (and show you a silly message that indicates that); subsequent runs will show that the cookie has been set and is active. Do note that the above PHP script has a fairly short cookie lifetime (2 minutes, if I recall correctly.)

    ben@Tyr:/tmp$ ./cookie_test <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" > <html> <head><title>Cookies</title></head> <body> <p>This page comes with yummy cookies. It started at 1226629378, h +as been<br> reloaded 0 times, and has lasted 0 seconds.</p> </body> </html> ben@Tyr:/tmp$ ./cookie_test <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" > <html> <head><title>Cookies</title></head> <body> <p>This page comes with yummy cookies. It started at 1226629378, h +as been<br> reloaded 1 time, and has lasted 6 seconds.</p> </body> </html>

    --
    "Language shapes the way we think, and determines what we can think about."
    -- B. L. Whorf
      Assuming the remote server is using cookies rather than session IDs (you can find out by closing your browser and then trying to reconnect without logging in: if you can, then it's cookies; otherwise, it's sessions)

      I am confused. This doesn't make sense to me.

      The site is using cookies, I'm certain. It's not passing session vars around. But closing and restarting the browser would only prove whether there were cookies which expired at some later date, instead of at the end of the session. Those are the "remember me" type cookies.

      And, as I know that the site is sending cookies to my browser, why is Mech not getting those cookies and writing them to the cookies file? It knows where the file is and can write to it. But it doesn't write any cookies.



      Nobody says perl looks like line-noise any more
      kids today don't know what line-noise IS ...

        I'm not an expert at this, but why would a browser, or WWW:Mechanize write a session cookie to a file? It's only supposed to be around as long as the browser is running, so keeping it in memory only makes sense.


        sas
      I find that setting the cookie_jar and "autosave" in the call to 'new()' causes W::M to spit out errors - so I tend to do it manually.
      Like what? I've never had problems with it.

        Starting with an empty cookie file and replacing the empty call to 'new()' with 'new(cookie_jar=>...', etc., results in

        Use of uninitialized value in pattern match (m//) at /usr/share/perl5/ +HTTP/Cookies.pm line 425. /tmp/cookies does not seem to contain cookies at /usr/share/perl5/HTTP +/Cookies.pm line 426.

        If I simply use 'new()', it doesn't happen.


        --
        "Language shapes the way we think, and determines what we can think about."
        -- B. L. Whorf