Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

How can I do this better?

by spaz (Pilgrim)
on Jan 19, 2001 at 04:03 UTC ( [id://52883]=perlquestion: print w/replies, xml ) Need Help??

spaz has asked for the wisdom of the Perl Monks concerning the following question:

Second question today, please nobody shoot me!

I have a subroutine that fetches a URL and saves it to the named file. I use LWP::UserAgent and HTTP::Request. As you can see from the code, the subroutine tries three times to fetch the URL (sleeping for 5 seconds between fetches) and if all three fail, returns 0. (Leaving the caller to handle appropriately).

My question though is: Is there a better way to do this?
use strict; use LWP::UserAgent; ################################################################# # Returns the success of storing $url into newly created $output sub get_page { my( $url, $output ) = @_; my $ua = LWP::UserAgent->new; $ua->agent( "$0/0.5 " . $ua->agent); $ua->timeout( 30 ); my $req = HTTP::Request->new( GET => $url ); $req->header( 'Accept' => 'text/html, image/gif' ); for( my $i = 1; $i <= 3; $i++ ) { # Send the request my $res = $ua->request( $req ); if( $res->is_success ) { open( OUT, ">$output" ) or die "Couldn't open $output: $!"; print OUT $res->content; close( OUT ) or die "Couldn't close $output: $!"; return 1; } else { print "Couldn't get $url on $i/3\n"; print "Sleeping 5 seconds...\n"; sleep 5; print "Gonna try again...\n"; } } return 0; }

Replies are listed 'Best First'.
Re: How can I do this better?
by mwp (Hermit) on Jan 19, 2001 at 07:10 UTC
    For the lazy in you, allow me to introduce LWP::Simple:
    use strict; use LWP::Simple qw(getstore head is_success); my @allowed = ('text/html', 'image/gif'); my $content = (head($url))[0]; die "Invalid content type! Please check URL.\n" unless grep($content =~ /\Q$_\E/, @allowed); FETCH: for(1..5) { if(is_success getstore($url, $output)) { printf "$output %d bytes OK\n", -s $output; last FETCH; } else { sleep 5; } } continue { die "Unable to fetch and store: $!\n$url => $output\n"; }
    Not bad, eh? Let it do the work for you. :)

    Update: Fixed a small bug, thanks chipmunk!

    Update: spaz, try this on for size (LWP::Simple perldoc, approx line 148):

    use LWP::Simple qw($ua); $ua->timeout(40); # set timeout, default 180

      That's all well and good, but the reason I did it myself was so that I could set the timeout value. And I didn't see anyway to do that in man LWP::Simple.

      Any more help?
Re: How can I do this better?
by dws (Chancellor) on Jan 19, 2001 at 04:23 UTC
    1. You accept HTML and GIF, but not JPEG or PNG? Are you really sure that's what you want to do?
    2. You're sleeping one too many times for failing urls.
    3. If you take a look at why you're unable to fetch a url, you may discover that it is pointless to try again two times (costing you another 2 x (roundtrip + 5) seconds, with an extra 5 seconds thrown in if you don't fix (2)).
Re: How can I do this better?
by AgentM (Curate) on Jan 19, 2001 at 04:16 UTC
    Your question is unclear. What is it that you want to do better? Do you want to know a better way if a web page doesn't exist or a server is down? (Catch any error that the web server returns, if any- using their error number and string information or Net::Ping a potentially dead server.)Or do you want to be more instructive to the user? (Try warn or any of the Carp utilities.) Would you like to improve speed efficiency? Also, no biggie, but you could try the more Perlish for loop: for(1..3){stuff;}
    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
      Well I was wondering if there were any inefficiencies or Bad Form (c) portions of the code. Also if there are any shorter/clearer ways to do this.
Re: How can I do this better?
by zzspectrez (Hermit) on Jan 19, 2001 at 06:31 UTC

    If the routine will be downloading binary data such as images or files you might consider doing binmode(OUT); immediately after the open. Otherwise on some systems such as Windows, the data will be corrupted on write because a "\n" is actually two characters on Windows and is changed to "\015\012". This works fine for text data but will cause problems if the data is not text. For more information see perlport and my problem with this at the following node Windows CRLF Confusion.

    zzSPECTREz

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://52883]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-03-28 08:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found