Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Lack of LWP::SIMPLE information

by Wassercrats
on Jul 12, 2003 at 13:02 UTC ( #273616=perlquestion: print w/replies, xml ) Need Help??

Wassercrats has asked for the wisdom of the Perl Monks concerning the following question:

What should I have done to get information on LWP::SIMPLE? At http://perldoc.com/perl5.8.0/lib.html there are links to information on:

LWP The World-Wide Web library for Perl LWP::ConnCache Connection cache manager LWP::Debug debug routines for the libwww-perl library LWP::MediaTypes guess media type for a file or a URL LWP::MemberMixin Member access mixin class LWP::Protocol Base class for LWP protocols LWP::RobotUA A class for Web Robots LWP::UserAgent A WWW UserAgent class

The search tool at perldoc.com gave me a few links, but the one that seemed most related was http://www.perldoc.com/perl5.8.0/lib/LWP/Simple.html, and that didn't contain the information that I wanted. I wanted to know the differences between the simple and full-featured LWP modules, specifically, whether the support for "transparent redirect handling" that's mentioned in the general LWP information at http://perldoc.com/perl5.8.0/lib/LWP.html exists for LWP::SIMPLE. Someone told me that my script, which uses LWP::SIMPLE, was able to get the page it was redirected to, but I know my script couldn't get certain other pages that required a redirect. I wanted to know if I needed the full-functioning LWP for full redirect support.

I ended up searching google for "lwp::simple" redirects, and found http://archive.develooper.com/libwww@perl.org/msg04147.html, which gave me the answer: "LWP::Simple does not perform automatic redirects. You will need to use LWP::UserAgent and LWP::Request." I'd like to confirm this with the docs. Could someone tell me where to look?

Thanks

Replies are listed 'Best First'.
(jeffa) Re: Lack of LWP::SIMPLE information
by jeffa (Bishop) on Jul 12, 2003 at 14:22 UTC
    There are (IIRC) two ways to redirect ... one is to use a Redirection Header and the other is to use a <meta> tag:
    <meta http-equiv="Refresh" content="15; URL=redirect.html"/>
    
    I set up two (three including the file redirected to) test files:
    1. redirect.html:
      <html> <meta http-equiv="Refresh" content="1; URL=redirected.html"/> <head> <title>redirect</title> </head> <body> you won't see me </body> </html>
    2. cgi-bin/redirect.cgi:
      #!/usr/bin/perl -Tw use strict; use CGI qw(redirect); print redirect('http://localhost/redirected.html');
    And then i fetched both pages with LWP::Simple. Here are the results:
    [jeffa]$ perl -MLWP::Simple -le'getprint "http://localhost/cgi-bin/red +irect.cgi"' <html> <head> <title>redirected!</title> </head> <body> you've been redirected! </body> </html> [jeffa]$ perl -MLWP::Simple -le'getprint "http://localhost/redirect.ht +ml"' <html> <meta http-equiv="Refresh" content="1; URL=redirected.html"/> <head> <title>redirect</title> </head> <body> you won't see me </body> </html>
    So ... in semi-conclusion, looks like LWP::Simple will transparantly grab the redirected page IF the redirection was implemented with a redirection header, not a meta tag. Hope this helps.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      LWP::UserAgent (which LWP::Simple uses) follows redirects by default for GET and HEAD requests - this is a configurable behaviour, see requests_redirectable and redirect_ok in the documentation.

      It also stands to reason that only header redirects (ie Status: 302 Moved) is followed. Otherwise the module would need to parse the HTML looking for meta tags. Using the proper modules, or maybe just a simple regexp, this is not too hard to implement yourself, but it doesn't belong in the core module IMO. :)

      Another way that some (thankfully few nowadays) do redirects is by scripting, such as javascript. This is a tougher nut to crack if needed.


      You have moved into a dark place.
      It is pitch black. You are likely to be eaten by a grue.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Lack of LWP::SIMPLE information
by hossman (Prior) on Jul 12, 2003 at 18:25 UTC
    The search tool at perldoc.com gave me a few links, but the one that seemed most related was http://www.perldoc.com/perl5.8.0/lib/LWP/Simple.html, and that didn't contain the information that I wanted.

    That is in fact the documentation on LWP::Simple

    The API is *very* simplistic on purpose, it's designed with the hope that after reading that POD, you'll understand how to download simple WWW pages.

    For any functionality/questions beyond that, you should start reading all of the docs in the See Also section.

Re: Lack of LWP::SIMPLE information
by Wassercrats on Jul 12, 2003 at 13:09 UTC
    Sorry about the missing line breaks in the second paragraph. Too bad I can't edit it.
      I popped in and edited it for you ... hope you don't mind. Normally i would wait for a consideration and a good number of 'edit' votes ... but in this case i saw no harm.

      Regarding your question ... LWP::Simple is indeed simple to use (i searched for 'redirect' in the POD and found no match), so simple that what you want may indeed not be there. In the past, i used LWP and Co. for serious web bots ... these days i just WWW::Mechanize (check out WWW::Mechanize::Shell as well). They make writting web bots fun again. ;)

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
        Thanks for the edit! WWW::Mechanize sounds familiar. I'm barely a Perl programmer, and I'm almost done with a complex perl script that I'd rather not re-work, but I'll read up on Mechanize. I'm wondering... since LWP::SIMPLE can get the page that has the redirect code in it, maybe I should write a redirect routine myself... on the other hand I'd like to adjust the timeout, which I don't think LWP::Simple allows.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://273616]
Approved by valdez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2021-12-04 14:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (30 votes). Check out past polls.

    Notices?