Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Is there a way to continue downloads when using WWW::Mechanize?

by beable (Friar)
on Jul 20, 2004 at 01:43 UTC ( [id://375759]=perlquestion: print w/replies, xml ) Need Help??

beable has asked for the wisdom of the Perl Monks concerning the following question:

Is there an easy way to continue a download which might have been interrupted by timeouts or the network going down when using WWW::Mechanize?
#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new; $mech->env_proxy; my $url = "http://127.0.0.1/downloads/index.html"; $mech->get($url); my $link = $mech->find_link(text_regex => qr/really big download/i); $mech->get($link); __END__
I figure I could replace $mech->get($link); with system("wget -c " . $link->url_abs);, but I'd prefer to do it in Perl. Is there an easy way?

Replies are listed 'Best First'.
Re: Is there a way to continue downloads when using WWW::Mechanize?
by PodMaster (Abbot) on Jul 20, 2004 at 06:09 UTC
    Basically you need to do what wget -c is doing, which is checking the filesize of the local file, then requesting the uri with a Range header...

    update: example

    C:\>GET http://crazyinsomniac.perlmonk.org/css/+.css /* Here be PodMaster's modifications/additions */ textarea {width: 100%; height: 25em;} C:\>GET -H "Range: bytes=50-" http://crazyinsomniac.perlmonk.org/css/+ +.css textarea {width: 100%; height: 25em;} C:\>

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      Perhaps then...

      my $offset = how_much_already_downloaded($link); $mech->get($link, Range => "bytes=$offset-");

          --k.


Re: Is there a way to continue downloads when using WWW::Mechanize?
by tilly (Archbishop) on Jul 20, 2004 at 06:14 UTC
    An easy way? I don't know of one. But it should be possible.

    What you'd have to do is save partial downloads, stare at the spec, and decide what the correct header is to send (you want a Range header, but you'll have to work out the parameters). Even then it will only work if the server supports range, which it might or might not. So test for that.

    If you do this, then you'll be using the same technique that wget does. If you do this work, then it may be worthwhile to think hard about how to integrate the functionality either into LWP or WWW::Mechanize in some easy way. (Offhand I'd suggest having LWP have a get_with_retry which accepts parameters for its retry logic.)

    Honestly I have to say that wget sounds like the solution that I'd be inclined to use.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://375759]
Approved by Ao
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-23 22:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found