Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Downloading image files using LWP

by gnangia (Scribe)
on Nov 05, 2002 at 19:15 UTC ( [id://210549]=perlquestion: print w/replies, xml ) Need Help??

gnangia has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am using LWP to download images from a url. However, after the first time it downloads it, it is looking at the local copy instead of getting it from the web server. I modified my LWP request to include turning off caching but did not help.
my $request = HTTP::Request->new(GET => $url); $request->header('Pragma' => 'no-cache', 'Cache-Control'=> 'no-cache', 'max-age' => '0');
Can anyone advise me as to what I can do to turn off caching?
Thanks,
GNangia

Replies are listed 'Best First'.
Re: Downloading image files using LWP
by Jenda (Abbot) on Nov 05, 2002 at 20:00 UTC

    Are you sure it uses a local copy? I'd think there is a proxy between you and the server that does this.

    Anyway ... if you need to be really sure try to append the timestamp to the URL:

    my $request = HTTP::Request->new(GET => $url . '?' . time());

    The parameter will most likely be ignored by the webserver, but the proxies will not dare to interfere.

    Of course if there already is a query in the URL add a new parameter only :

    my $request = HTTP::Request->new(GET => $url . '&tImEstaMp=' . time() +);

    Jenda

      Ok here is my code,
      while( my $url = shift @urls) { print "URL is $url\n"; my $request = HTTP::Request->new(GET => $url); my $parser = HTML::Parser->new(api_version => 3); $parser->handler(start => \&start,'self,tagname,attr'); my $response = $browser->request($request); if ($response->is_success) { print $response->content(); $parser->{base} ||= $response->base; $parser->{browser} ||= $browser; $parser->parse($response->content); $parser->eof(); } else { print "ERROR: " . $response->status_line . "\n"; } } sub start + { my ($parser,$tagname,$attr)= @_; if ($tagname eq 'img') { if ($attr->{src}) + { + my $img_url = $attr->{src}; + my $remote_name =URI->new_abs($img_url,$parser +->{base}); #my ($local_name) = $img_url =~ m!([^/]+)$!; + my $local_name = $remote_name->host . $remote_ +name->path ; + #my $local_name = "/dev/null"; + mkpath(dirname($local_name),0,0711); + print "Getting imagefile: $img_url\n"; + my $response = $parser->{browser}->mirror($rem +ote_name,$ local_name); + print STDERR "YYY-$local_name: ",$response->me +ssage,"\n" ; + } + } + }
      Here is the output when I run it the second time Getting imagefile: images/logo.gif
      LWP::UserAgent::mirror: () LWP::UserAgent::request: () HTTP::Cookies::add_cookie_header: Checking www.google.com for cookies
      HTTP::Cookies::add_cookie_header: Checking .google.com for cookies
      HTTP::Cookies::add_cookie_header: - checking cookie path=/
      HTTP::Cookies::add_cookie_header: - checking cookie PREF=ID=0f9d8bbb3b0ee898:TM =1036535059:LM=1036535059:S=2ea2eKPQlO4uYAN6
      HTTP::Cookies::add_cookie_header: it's a match
      HTTP::Cookies::add_cookie_header: Checking google.com for cookies
      HTTP::Cookies::add_cookie_header: Checking .com for cookies
      LWP::UserAgent::send_request: GET http://www.google.com/images/logo.gif
      LWP::UserAgent::_need_proxy: Not proxied
      LWP::Protocol::http::request: ()
      LWP::UserAgent::request: Simple response: Not Modified
      YYY-www.google.com/images/logo.gif: 304 Not Modified
        You are using LWP::UserAgent::mirror() which does the local caching. That checks for the local file, uses its timestamp in a If-Modified-Since header, and does a conditional GET.

        Since you want to force the file to be downloaded, either don't use mirror, or delete the local file before you call it.

        The UserAgent request method takes a filename as the second parameter. It will create (or overwrite) the file with the downloaded contents. You should check that the download succeed and returned the expected number of bytes.

        Are you sure that the remote image is getting changed between iterations?? If not, then the program is doing what it should. If you want the image regardless of wether or not it has been modified, then delete or rename the local copy.

        To test, create a logo.gif file and copy it in place of the cached version before making the new request. It should notice a later last modified time and get the newer file.

        ~Hammy

        At a guess, I'd say lose that cookie between runs.  It looks like Google's being smart about whether you already got it :).

          p
Re: Downloading image files using LWP
by traveler (Parson) on Nov 05, 2002 at 19:30 UTC
    You don't say much about your setup, but I have found that more than one web cacheing (read "proxy") server seems to ignore the no-cache directive. If you have a cacheing server between the client and the Internet, that may be the issue.

    HTH, --traveler

Re: Downloading image files using LWP
by fruiture (Curate) on Nov 05, 2002 at 19:26 UTC

    Define "using LWP"!

    The Request Headers for Cache-Control control the caching of the server but you say the request is not even sent to the server, so they're useless. Your HTTP-Client must be configured not to cache, so how does the Client code look (LWP::UserAgent or LWP::Simple ...)?

    --
    http://fruiture.de
Re: Downloading image files using LWP
by fglock (Vicar) on Nov 05, 2002 at 19:25 UTC

    I usually just rename my local copy.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://210549]
Approved by valdez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 21:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found