http://qs321.pair.com?node_id=254031

julsford has asked for the wisdom of the Perl Monks concerning the following question:

Monks ~ I've hit on one of those wonderful issues that you can only learn when it's too late -- when using LWP::UserAgent to validate the existance of a file before creating a link to it, it will read the entire file into memory. When the files are huge, they are running me out of system memory. I've attempted to set the following:
my $ua = LWP::UserAgent->new(env_proxy =>1, max_size => 100, );
I don't appear to be having any success. Advice would be appreciated.

Juls

Replies are listed 'Best First'.
Re: LWP::UserAgent and HUGE FILES
by cacharbe (Curate) on Apr 29, 2003 at 18:10 UTC
Re: LWP::UserAgent and HUGE FILES
by bart (Canon) on Apr 29, 2003 at 19:27 UTC
    If head() doesn't work... can't you use the callback interface? That way, you'll only read in blocks of a few k at a time, which you can then discard.

    The synopsis for LWP::UserAgent says the basic syntax is

    $response = $ua->request($request, \&callback, 4096);
    with &callback a sub receiving the data, and 4096 the chunk size.
Re: LWP::UserAgent and HUGE FILES
by shotgunefx (Parson) on Apr 29, 2003 at 18:39 UTC
    I believe some servers don't respond to head appropriately so if that dosn't work
    from the LWP pod
    $ua->simple_request($request, [$arg , $size]) This method dispatches a single WWW request on behalf of a user, and returns the response received. The $request should be a reference to a HTTP::Request object with values defined for at least the method() and url() attributes. If $arg is a scalar it is taken as a filename where the content of the response is stored.

    If $arg is a reference to a subroutine, then this routine is called as chunks of the content is received. An optional $size argument is taken as a hint for an appropriate chunk size.

    If $arg is omitted, then the content is stored in the response object itself.

    -Lee

    "To be civilized is to deny one's nature."
Re: LWP::UserAgent and HUGE FILES
by BrowserUk (Patriarch) on Apr 29, 2003 at 18:54 UTC

    I think that the problem comes down to whether the server you are requesting from honours the Range: header. The maz_size parameter uses that to request a the first max_size bytes of the file. If the server doesn't honour this (I believe optional, but I couldn't find the appropriate RFC) header, then you will likely get the lot. You might try the HEAD command, but that isn't always honoured either.

    If this is a show stopper to your application, you could consider hacking LWP::Protocol::http.pm/request(). It ought to be possible to cross-reference any Range: header specified on the request and bottle out of the read loop early, or perhaps better, read and discard content greater than requested, but your a brave man if you decide to mess with this stuff:)

    If neither the HEAD requests nor the Range: header are being honoured by the server, it might be better to nag the sysops of that server to enable/upgrade them.


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: LWP::UserAgent and HUGE FILES
by Aristotle (Chancellor) on Apr 30, 2003 at 13:43 UTC