Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Web Page Expiry

by jai_dgl (Beadle)
on Nov 14, 2008 at 09:26 UTC ( [id://723599]=note: print w/replies, xml ) Need Help??


in reply to Re: Web Page Expiry
in thread Web Page Expiry

Hi, I used the mirror functionality of LWP::Simple.
But it doesn't compare the file content. It checks for the last modified time with the local cache file.
If the Last modified time in the server and the local file, it rewrites the local file.
My requirement is there any module to compare the local file content and the server page content.

thx

Replies are listed 'Best First'.
Re^3: Web Page Expiry
by MidLifeXis (Monsignor) on Nov 14, 2008 at 15:00 UTC

    Usually on the web, the If-modified-since solution (used behind the scenes by LWP::Simple::mirror) is the preferred solution. However, if that solution is not reliable (and therefore not reliable for others either), you will probably need to fetch the file and compare them. You could probably start your search here on cpan if you want to use a Perl solution. If you don't limit yourself to Perl, there are other OS-specific tools that you can use (like diff or rdist).

    On the other hand, if all you are looking to do is see if the file needs to be updated locally, and you have to retrieve the file to determine that anyway, why not just update the file.

    As an alternative, is there a checksum (MD5, etc) file generated for the file on the remote server? If so, you could retrieve that instead (in theory it should be smaller), and compare them to determine if you need to download the real file.

    I would also see if you can work with the source site to get their timestamps correct for the mirror process to work. That is the RightWay™ to do it. If this does not work, then any cache (your workstation's local cache, company, isp, accelerators on the remote side, etc) in the way can hose your checks anyway.

    --MidLifeXis

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://723599]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2024-04-19 07:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found