Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: using LWP::Simple to fetch binary file (gnu zip)

by Largins (Acolyte)
on Jan 25, 2012 at 11:54 UTC ( [id://949885]=note: print w/replies, xml ) Need Help??


in reply to Re: using LWP::Simple to fetch binary file (gnu zip)
in thread using LWP::Simple to fetch binary file (gnu zip)

The actual error message is #501 not implemented
Here is a simplified bit of code that returns the same error
First, a copy of the robots.txt file from the site used (www.archive.org):

############################################## # # Welcome to the Archive! # ############################################## # Please crawl our files. # We appreciate if you can crawl responsibly. # Stay open! ############################################## # slow down the ask jeeves crawler which was hitting our SE a little t +oo fast # via collection pages. --Feb2008 tracey-- User-agent: Teoma Disallow: /control/ Disallow: /report/ Sitemap: http://www.archive.org/sitemap/sitemap.xml Crawl-delay: 10 User-agent: * Disallow: /control/ Disallow: /report/ Disallow: /details/goldenbull2007john/ Disallow: /stream/goldenbull2007john/ Disallow: /download/goldenbull2007john/ Disallow: /14/items/goldenbull2007john/goldenbull2007john_djvu.txt Sitemap: http://www.archive.org/sitemap/sitemap.xml Crawl-delay: 10

Next a small portion of the sitemap.xml
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc> http://www.archive.org/sitemap/sitemap_00000.xml.gz </loc> <lastmod> 2012-01-24T11:32:13Z </lastmod> </sitemap> <sitemap> <loc> http://www.archive.org/sitemap/sitemap_00001.xml.gz </loc> <lastmod> 2012-01-24T11:32:18Z </lastmod> </sitemap>

And a simple perl script with error checking
#!/usr/bin/env perl # # Name: TestFetch.pl # # Requires Internet access # use strict; use warnings; use LWP::Simple; use HTML::Parser; use HTTP::Status qw(:constants :is status_message); package main; my $text = 'http://www.archive.org/sitemap/sitemap_00000.xml.gz'; my $filename = 'sitemap_00000.xml.gz'; my $hstatus = 0; $hstatus = LWP::Simple->getstore ($text, $filename); if($hstatus != HTTP_OK) { print "$hstatus: ", status_message($hstatus), "\n"; }

I am able to fetch the file manually
Largins

Replies are listed 'Best First'.
Re^3: using LWP::Simple to fetch binary file (gnu zip)
by BrowserUk (Patriarch) on Jan 25, 2012 at 12:02 UTC

    The problem is you are calling a function as a method:

    $hstatus = LWP::Simple->getstore ($text, $filename);

    Change that to:

    $hstatus = getstore ($text, $filename);

    And it will work.

    Effectively you are calling the function with the string 'LWP::Simple' as the first argument where it is expecting a URL. It tries to parse that to discover the protocol (http://, https://, ftp:// etc.) that it should use and doesn't find anything it recognises, so it return 501-Not Implemented.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Thanks very much.
      Although I have been programming for many years (1st line of code on a Burroughs t series 1965), I am new (by a few months) to perl
      For the life of me, I couldn't see what was wrong, but now I see my error

      Thanks again, and thanks to perlmonks for this site

      Largins

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://949885]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-16 10:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found