Maybe LWP::UserAgent::WithCache would do what you want.
Generally, you should stat the local copy of the page to get its modification time. Pass that through as a header ("If-Modified-Since") in your request to the web server. The server should be able to check the date and either offer up the full page a short "no change" message.
| [reply] |
If you are only interested if the web page was changed, and not how, you can use File::Compare (core since Perl 5.004). | [reply] |
| [reply] |
Hi,
I used the mirror functionality of LWP::Simple.
But it doesn't compare the file content. It checks for the last modified time with the local cache file.
If the Last modified time in the server and the local file, it rewrites the local file.
My requirement is there any module to compare the local file content and the server page content.
thx
| [reply] |
Usually on the web, the If-modified-since solution (used behind the scenes by LWP::Simple::mirror) is the preferred solution. However, if that solution is not reliable (and therefore not reliable for others either), you will probably need to fetch the file and compare them. You could probably start your search here on cpan if you want to use a Perl solution. If you don't limit yourself to Perl, there are other OS-specific tools that you can use (like diff or rdist).
On the other hand, if all you are looking to do is see if the file needs to be updated locally, and you have to retrieve the file to determine that anyway, why not just update the file.
As an alternative, is there a checksum (MD5, etc) file generated for the file on the remote server? If so, you could retrieve that instead (in theory it should be smaller), and compare them to determine if you need to download the real file.
I would also see if you can work with the source site to get their timestamps correct for the mirror process to work. That is the RightWay™ to do it. If this does not work, then any cache (your workstation's local cache, company, isp, accelerators on the remote side, etc) in the way can hose your checks anyway.
| [reply] [d/l] [select] |
you may remember when you first accessed the document (or stat a saved file) and issue a HEAD request for the page for the successive attempts. For example, LWP::Simple::head() returns among others the last modified time. | [reply] |
| [reply] |