Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Why would LWP::Simple::get stop working?

by dpmott (Scribe)
on Jun 20, 2005 at 18:52 UTC ( [id://468453]=perlquestion: print w/replies, xml ) Need Help??

dpmott has asked for the wisdom of the Perl Monks concerning the following question:

(post updated 06/20/2005 17:00)
I have a number of scripts that run automatically for me on my WinXP Pro box running ActiveSTate Perl 5.8.0 build 806. For the last serveral days these scripts have not been working for me.

My environment variables (i.e. http_proxy and such) have not been changed. I don't use a proxy.

So, I run them by hand. On a line like this:
my $doc = get URL;
my script locks up.

At a command line on the same box, I can type:
perl -S GET URL
where URL is the same as what is in the script. That works fine, presumably because GET.pl uses LWP::UserAgent.

Putting that assumption to the test, I wrote this small script:

#!perl # What's Wrong with LWP::Simple::get ? use strict; use LWP::UserAgent; use LWP::Simple; use constant URL => 'http://www.google.com'; main(); exit(0); sub main { print "Fetching '" . URL . "' with LWP::UserAgent\n"; my $ua = LWP::UserAgent->new(); my $response = $ua->get(URL); if ( $response->is_success ) { my $len = length($response->content); print "Retrieved $len bytes\n"; } else { my $code = $response->code; print "Failed to retrieve URL: $code\n"; } print "Fetching '" . URL . "' with LWP::Simple\n"; my $content = LWP::Simple::get( URL ); if ( $content ) { my $len = length($content); print "Retrieved $len bytes\n"; } else { print "Failed to retrieve URL\n"; } }

I can run this script on a *NIX box with no problem. On my WinXPPro box, however, the $ua->get() works but then it never gets past the LWP::Simple::get() call.

Note that the scripts that are now failing have not been modified in weeks, if not months.

So, this post is both a question for help and a poll to see if anyone else on WinXP is having similiar problems with their LWP::Simple::get functionality.

So, has anyone else seen a problem like this? If so, how do I fix it?


Update: After uninstalling/reinstalling ActiveState Perl, my LWP modules seem to be working fine. Honestly, I haven't modified my environment or Perl installation in over two weeks, so I really can't fathom what could have changed.

I appreciate everyone's feedback on this thread. I've learned a lot about the LWP modules and how to keep them happy :)

Replies are listed 'Best First'.
Re: Why would LWP::Simple::get stop working?
by tlm (Prior) on Jun 20, 2005 at 19:32 UTC

    It could be that the server in question started checking for the user agent type of all incoming requests, and rejecting those corresponding to the $ua object of LWP::Simple. To test this hypothesis, add the following line somewhere after the definition of my $ua, but before calling LWP::Simple::get:

    $LWP::Simple::ua->agent( $ua->agent );
    The idea is to make the agent field LWP::Simple's internal user agent object (which is also called $ua) match the corresponding default agent field for a LWP::UserAgent object (like your $ua).

    Update: I should have pointed out in my original post that if the hypothesis above is correct, this would strongly suggests that the site you are trying to access has an anti-robot policy. The workaround implied by the code above is a technical solution, but it is up to you to determine what the site's robot policy is (starting with reading their robots.txt file, if any), and weigh the consequences of circunventing it. Some site's are very strict and unequivocal about this.

    Update 2: As indicated by the overstrike above, the particular details of what I proposed were wrong, because importing $ua from LWP::Simple has side-effects that don't take place if one omits the import step and uses the fully qualified $LWP::Simple::us, which I was not aware of (thanks++ to dpmott for pointing it out).

    the lowliest monk

      This suggestion is getting me closer.

      It turns out that you must have a 'use' statement that looks like this to get access to the $ua object:
      use LWP::Simple qw/$ua/;
      If you want to use get() without fully qualifying it, then you also have to include that along with $ua.

      When I specify the '$ua' in the import list, everything starts working, and working well.

      If I do not specify anything in the import list, then LWP::Simple::get() or just get() never returns.

      It looks like my PPM is broken, too, so I think that I have a re-install in my near future. I must've gotten my hands on a bad module somewhere...
Re: Why would LWP::Simple::get stop working?
by ikegami (Patriarch) on Jun 20, 2005 at 19:22 UTC

    The behaviour of LWP changes based on whether the following returns zero or more than zero matches.

    grep {lc($_) eq "http_proxy"} keys %ENV;

    Has that changed recently? LWP::UserAgent isn't used for HTTP by LWP::Simple if that expression returns zero.

      Good thought, but alas...
      c:\>perl -e "@a=grep {/http/i} keys %ENV; print int @a" 0 c:\>

        Why "alas"? I indicated you might have started noticing the problem because of a change in that value, but you don't demonstrate a lack of change.

        What you did show is that you end up in LWP::Simple's _trivial_http_get method. (Usually, LWP calls end up LWP::UserAgent.) You could try adding debug statements to it. It could be that you end up in an endless redirection loop that LWP::UserAgent somehow avoids.

        btw, no problems here:

        >perl -le "print scalar grep {/http/i} keys %ENV" 0 >perl -le "use LWP (); print $LWP::VERSION" 5.64 >perl script.pl Fetching 'http://www.google.com' with LWP::UserAgent Retrieved 2311 bytes Fetching 'http://www.google.com' with LWP::Simple Retrieved 2311 bytes
Re: Why would LWP::Simple::get stop working?
by Steve_p (Priest) on Jun 20, 2005 at 19:11 UTC

    At first blush, I'd say its firewall related. What does

    print "Failed to retrieve URL: $code\n";

    print out? I'm assuming it returns a 500, but without knowing exactly what your get returns, it is really hard to diagnose.

      Oh, sorry. I guess I wasn't clear on that point. The LWP::UserAgent::get works just fine. It tells me how many bytes that it has downloaded.

      The if/else under the LWP::Simple::get() is never reached.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://468453]
Approved by Steve_p
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-25 15:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found