LWP::UserAgent and HUGE FILES

by julsford
Monks ~ I've hit on one of those wonderful issues that you can only learn when it's too late -- when using LWP::UserAgent to validate the existance of a file before creating a link to it, it will read the entire file into memory. When the files are huge, they are running me out of system memory. I've attempted to set the following:
my $ua = LWP::UserAgent->new(env_proxy =>1, max_size => 100, );
I don't appear to be having any success. Advice would be appreciated.


Re: LWP::UserAgent and HUGE FILES
by cacharbe (Curate) on Apr 29, 2003 at 18:10 UTC
Re: LWP::UserAgent and HUGE FILES
by shotgunefx (Parson) on Apr 29, 2003 at 18:39 UTC
    I believe some servers don't respond to head appropriately so if that dosn't work
    from the LWP pod
    $ua->simple_request($request, [$arg , $size]) This method dispatches a single WWW request on behalf of a user, and returns the response received. The $request should be a reference to a HTTP::Request object with values defined for at least the method() and url() attributes. If $arg is a scalar it is taken as a filename where the content of the response is stored.

    If $arg is a reference to a subroutine, then this routine is called as chunks of the content is received. An optional $size argument is taken as a hint for an appropriate chunk size.

    If $arg is omitted, then the content is stored in the response object itself.


Re: LWP::UserAgent and HUGE FILES
by bart (Canon) on Apr 29, 2003 at 19:27 UTC
    If head() doesn't work... can't you use the callback interface? That way, you'll only read in blocks of a few k at a time, which you can then discard.

    The synopsis for LWP::UserAgent says the basic syntax is

    $response = $ua->request($request, \&callback, 4096);
    with &callback a sub receiving the data, and 4096 the chunk size.
Re: LWP::UserAgent and HUGE FILES
by BrowserUk (Pope) on Apr 29, 2003 at 18:54 UTC

    I think that the problem comes down to whether the server you are requesting from honours the Range: header. The maz_size parameter uses that to request a the first max_size bytes of the file. If the server doesn't honour this (I believe optional, but I couldn't find the appropriate RFC) header, then you will likely get the lot. You might try the HEAD command, but that isn't always honoured either.

    If this is a show stopper to your application, you could consider hacking It ought to be possible to cross-reference any Range: header specified on the request and bottle out of the read loop early, or perhaps better, read and discard content greater than requested, but your a brave man if you decide to mess with this stuff:)

    If neither the HEAD requests nor the Range: header are being honoured by the server, it might be better to nag the sysops of that server to enable/upgrade them.

Re: LWP::UserAgent and HUGE FILES
by Aristotle (Chancellor) on Apr 30, 2003 at 13:43 UTC

