Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: using LWP::Simple to fetch binary file (gnu zip)

by lune (Pilgrim)
on Jan 24, 2012 at 15:39 UTC ( #949717=note: print w/replies, xml ) Need Help??


in reply to using LWP::Simple to fetch binary file (gnu zip)

I suppose your print statement, appearing before the getstore call gives you the expected result.

So why don't you just check for errors returned by getstore first?

My guess is: 401 (RC_UNAUTHORIZED)
  • Comment on Re: using LWP::Simple to fetch binary file (gnu zip)

Replies are listed 'Best First'.
Re^2: using LWP::Simple to fetch binary file (gnu zip)
by Largins (Acolyte) on Jan 25, 2012 at 11:54 UTC

    The actual error message is #501 not implemented
    Here is a simplified bit of code that returns the same error
    First, a copy of the robots.txt file from the site used (www.archive.org):

    ############################################## # # Welcome to the Archive! # ############################################## # Please crawl our files. # We appreciate if you can crawl responsibly. # Stay open! ############################################## # slow down the ask jeeves crawler which was hitting our SE a little t +oo fast # via collection pages. --Feb2008 tracey-- User-agent: Teoma Disallow: /control/ Disallow: /report/ Sitemap: http://www.archive.org/sitemap/sitemap.xml Crawl-delay: 10 User-agent: * Disallow: /control/ Disallow: /report/ Disallow: /details/goldenbull2007john/ Disallow: /stream/goldenbull2007john/ Disallow: /download/goldenbull2007john/ Disallow: /14/items/goldenbull2007john/goldenbull2007john_djvu.txt Sitemap: http://www.archive.org/sitemap/sitemap.xml Crawl-delay: 10

    Next a small portion of the sitemap.xml
    <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc> http://www.archive.org/sitemap/sitemap_00000.xml.gz </loc> <lastmod> 2012-01-24T11:32:13Z </lastmod> </sitemap> <sitemap> <loc> http://www.archive.org/sitemap/sitemap_00001.xml.gz </loc> <lastmod> 2012-01-24T11:32:18Z </lastmod> </sitemap>

    And a simple perl script with error checking
    #!/usr/bin/env perl # # Name: TestFetch.pl # # Requires Internet access # use strict; use warnings; use LWP::Simple; use HTML::Parser; use HTTP::Status qw(:constants :is status_message); package main; my $text = 'http://www.archive.org/sitemap/sitemap_00000.xml.gz'; my $filename = 'sitemap_00000.xml.gz'; my $hstatus = 0; $hstatus = LWP::Simple->getstore ($text, $filename); if($hstatus != HTTP_OK) { print "$hstatus: ", status_message($hstatus), "\n"; }

    I am able to fetch the file manually
    Largins

      The problem is you are calling a function as a method:

      $hstatus = LWP::Simple->getstore ($text, $filename);

      Change that to:

      $hstatus = getstore ($text, $filename);

      And it will work.

      Effectively you are calling the function with the string 'LWP::Simple' as the first argument where it is expecting a URL. It tries to parse that to discover the protocol (http://, https://, ftp:// etc.) that it should use and doesn't find anything it recognises, so it return 501-Not Implemented.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        Thanks very much.
        Although I have been programming for many years (1st line of code on a Burroughs t series 1965), I am new (by a few months) to perl
        For the life of me, I couldn't see what was wrong, but now I see my error

        Thanks again, and thanks to perlmonks for this site

        Largins

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://949717]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2022-05-24 12:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (82 votes). Check out past polls.

    Notices?