Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Wanted: LWP::Cache

by smalhotra (Scribe)
on Aug 04, 2003 at 18:58 UTC ( [id://280761]=perlquestion: print w/replies, xml ) Need Help??

smalhotra has asked for the wisdom of the Perl Monks concerning the following question:

I write a lot of apps that get data from the web. I lot of times I use WWW::Mechanize or just LWP::UserAgent directly. What I want is a cache like most other browsers (user agents) have. A lot of time the content requested is dynamic but most of the time when I am doing some sort of testing (parsing pages) I request the same page several times. I generally use one of two solutions:
(1) use Memoize on request()
(2) write a wrapper that looks in a specified directory for the filename that I am requesting.

Over time I add things like uptodate test (1 line) or a directory size check (2-3) lines. Is there a module out there that already does this, not generally, but specially for LWP? If not, then I am going to write LWP::Cache and would like some advice on what to add, and what to stay away from.

$will->code for @food or $$;

Replies are listed 'Best First'.
Re: Wanted: LWP::Cache
by valdez (Monsignor) on Aug 04, 2003 at 20:04 UTC

    I think that your problem is different. We already have a caching mechanism with LWP::Simple::mirror. What you are asking for is something that will defeat the purpose of HEAD method for something already stored somewhere and that will not obey no-cache instructions. This is reasonable, especially when you don't want to overload an external resource. I thought many times about this kind of caches and tried few times to write a module.

    I came up with a 'not_before sometime' policy that is well is applicable to Cache::Cache mentioned by PodMaster; furthermore that module can be used to cache complex URLs. If you are interested I can post some code.

    At the other end of the spectrum there are proxies, they are built to do this kind of work.

    Ciao, Valerio

Re: Wanted: LWP::Cache
by PodMaster (Abbot) on Aug 04, 2003 at 19:27 UTC
    There isn't a module specific for LWP that I know off, but I don't think one is needed (see Cache::Cache).

    update: On second thought, transparently handling things sounds good, go for it.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      I know you already know all of this, but i replied to you because it seemed like the place. ;) I just had to take a stab at this:
      use strict; use warnings; use LWP::Simple; use Cache::MemoryCache; my $cache = Cache::MemoryCache->new({ namespace => 'MyNamespace', default_expires_in => 60, }); for (1..3) { my $page = $cache->get('perlmonks'); unless ($page) { warn "fetching from web\n"; $page = get('http://perlmonks.org'); $cache->set('perlmonks', $page, "1 minute" ); } else { warn "fetching from cache\n"; } }
      The for loop is just to actually show that subsequent fetches (after the first) come from the memory cache and not the web. Use Cache::FileCache if you want persistence past the life of script execution.

      Honestly, as easy as this is ... it's still 'plumbing'. I would like to see a 'LWP::Cache' module as long as it is a transparent wrapper around LWP and Cache::Cache. Why not? ;)

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
Re: Wanted: LWP::Cache
by LTjake (Prior) on Aug 04, 2003 at 20:15 UTC

    Something to think about...

    If you store the Last-Modified and Etag header items and send them along in your request (as If-Modified-Since and If-None-Match respectively), a 304 code will be returned if the page hasn't changed. It should minimize some traffic for pages that don't change too often.

    --
    "To err is human, but to really foul things up you need a computer." --Paul Ehrlich

•Re: Wanted: LWP::Cache
by merlyn (Sage) on Aug 04, 2003 at 22:52 UTC
    I've had on my to-do list for a while to implement a transparent cache for LWP::UserAgent, because I've run across similar things to you.

    But instead of doing that, you could also just set up Apache's mod_proxy to be a caching proxy, and then point your UserAgent's proxy settings at your apache server. That'd effectively be the same thing.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: Wanted: LWP::Cache
by smalhotra (Scribe) on Aug 04, 2003 at 20:35 UTC
    Thanks for the ideas. It was exactly what I came looking for when I posted. I am trying to think whether LWP::Cache will actually provide anything beneficial over LWP::UserAgent::mirror() (which is what I was planning to use) and Cache::File. Please feel free to continue posting ideas here or here. I like the wiki as brainstorming platform or to summarize a thread like this one.

    $will->code for @food or $$;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://280761]
Approved by jeffa
Front-paged by Enlil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-24 12:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found