Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

How to get web creation date from webserver?

by gube (Parson)
on Aug 23, 2005 at 04:06 UTC ( [id://485812]=perlquestion: print w/replies, xml ) Need Help??

gube has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I have to get web page creation date from the web server. I have an url "http://jagadish.blogspot.com/index.html". I want to get the creation date of this page is it possible through perl. please help me.

Thanks in advance.

Replies are listed 'Best First'.
Re: How to get web creation date from webserver?
by merlyn (Sage) on Aug 23, 2005 at 05:18 UTC
    I have to get web page creation date from the web server.
    Why? It's not necessarily available. It might not even make sense. If the server says "last-modified", you might use that, but that may or may not be present, and that may or may not be what you consider a "creation date".

    Perhaps you need to step back and ask yourself why you have this problem. That's sometimes a useful problem solving technique when you've asked a question that is impossible to answer.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Hi all,

      Using httrack, by given url i have download the page and working in offline. I have to go and check the url daily the page as been modified or not. If the page as been modified i have to download else i have to exit. so, for this purpose i want to get the date is it possible help me.

        Read the HTTP specification. Specifically, section 14.25, 'If-Modified-Since'.

        You return the 'Last-Modified' timestamp from when you cached the file (or the date you got it, but then you have to deal with generating the date format), and if the file hasn't been modified, and the webserver supports this header, it should return a '304' status message, rather than the full content all over again.

        So you don't want to know when the page has been created, you want to know if the page has changed since you have last visited/downloaded it. I don't know of a readymade perl way to do this, but there are a lot of of programs, e.g. webmon.


        holli, /regexed monk/
Re: How to get web creation date from webserver?
by pg (Canon) on Aug 23, 2005 at 04:32 UTC
    use IO::Socket::INET; use strict; use warnings; my $connection = IO::Socket::INET->new(Proto => "tcp", PeerAddr => "ja +gadish.blogspot.com", PeerPort =>80) || die "failed"; print $connection "HEAD http://jagadish.blogspot.com/index.html HTTP/1 +.1\r\nHost: jagadish.blogspot.com\r\n\r\n"; while (<$connection>) { ($_ eq "\r\n") ? last : print; }

      Dear pg,

      I am getting this below output. In that date showing todays date. If i give any url i am getting today's date. But, the pages had been created long days ago. I want the creation date of the webpage in that server. Please help me any other possiblities

      here HTTP/1.1 200 OK Date: Tue, 23 Aug 2005 04:48:24 GMT Server: Apache Vary: Accept-Encoding test: %{HOSTNAME}e Last-Modified: Tue, 23 Aug 2005 04:36:44 GMT ETag: "84cadf-ada8-430aa7dc" Accept-Ranges: none Content-Length: 44456 Content-Type: text/html
        Could it be that these pages are dynamically generated and as such do not really exist on the file-system, other than in a virtual sense? That could explain you always see todays date: the files were freshly baked to your request.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

        Last-Modified gives you the time when the content was BELIEVED to be last modified (or the time the current content is created). If you want the exact time when the original content (which is probably gone already) was first created, I don't believe HTTP specification provides that. However you have to re-think whether that time is meaningful.

        Well, if you actually have access to the server, that will be a different story, and you can just use whatever command the OS provides you to check, if the content is not created on fly.

      TIMTOWTDI:

      use LWP::Simple; print @{[head('http://jagadish.blogspot.com/index.html')]}[2];
      --
      b10m

      All code is usually tested, but rarely trusted.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://485812]
Approved by gopalr
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-03-29 09:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found