Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Website Statistics

by debiandude (Scribe)
on Jun 02, 2003 at 13:25 UTC ( [id://262361] : perlquestion . print w/replies, xml ) Need Help??

debiandude has asked for the wisdom of the Perl Monks concerning the following question:

I would like to write some code to keep statistics about my site:

  • Unique visitors (check by ip)
  • Browsers Used
  • Operating System Used
  • Number of Visits

The only way I could think of doing this would be to use the Apache log files. However, I was wondering if there is a way I could do this in my cgi script instead of parsing the Apache log everytimes some one comes to my site. TIA.


Michael

Replies are listed 'Best First'.
Re: Website Statistics
by Juerd (Abbot) on Jun 02, 2003 at 13:28 UTC
      I wouldn't mind looking at some other code for inspiration but the site has been a project of mine for sometime and I would like to keep it all my code.
Re: Website Statistics
by Tanalis (Curate) on Jun 02, 2003 at 13:37 UTC
    Hmm. I guess what you could do would be to write a CGI that detects the browser, OS and IP of the client, and then uses this data to write to a statistics file/db, checking beforehand that the IP is unique.

    This could probably work well for a small site, with a smallish number of hits - remember that you'd have to maintain a list of every single IP that visits your site, and check against it every single time the index got loaded. That could get very slow very fast, especially with a high-volume site.

    Your other problem with using the IP to check uniqueness is that IPs aren't unique. The vast majority of the internet (from a client point of view) uses dynamically assigned IP addresses, not static ones, and hence even if the same IP visits twice, you can't guarantee that it's a unique visitor - it could be someone completely different on the same ISP.

    Maybe the easiest way to do something like this would be to use one of the logging tools that's already available, and provide nightly/delayed stats for browser/OS use. Webcounters are trivial to write, if highly inaccurate, as there's no real way to count a "unique" hit.

    As far as Apache stats tools go, I'd recommend something like Analog, which is a fairly powerful and customisable analysis tool.

    Hope that helps a little ..

    -- Foxcub
    #include www.liquidfusion.org.uk

      I see what you mean that checking against all IP's could get really slow. I doubt I would run into that problem since my site is small. However, perhaps a comprimise would be to store the last 20 or so IPs just so I don't count people hitting refresh and more hits.

        I don't think that checking the IP of the present visitor against all IP's which have already visited your site would get really slow.

        The only way to get a value for unique IP's which have visited your site is by putting the IP-numbers which have visited you in a database.

        Something like MySQL is probably the fastest solution although you could do it easily with any type of database which has a DBI/DBD interface.

        Checking and updating your database will be very fast as you can index on the field containing the IP. If you run it under Apache and mod_perl, you can even have persistent database connections so you save on the connect/disconnect overhead and the script stays "compiled" in between hits.

        Whether the values you obtain have any meaning is altogether another issue as there is no sure-fire way to link IP's to people.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

•Re: Website Statistics
by merlyn (Sage) on Jun 02, 2003 at 16:13 UTC
    Unique visitors (check by ip)
    No, IP Address is not a visitor. Corporations and ISPs using proxy caches engage all hits as if they are a single IP address. And AOL uses multiple proxies, so the same AOL user can fetch from multiple IP addresses in a single page hit!

    Do not be fooled into thinking IP == user. Do not permit statistics to be reported as such. To do so is unethical. You can report it as "unique IP addresses", but never never never annotate that as "unique visitors" without a big disclaimer.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: Website Statistics
by phydeauxarff (Priest) on Jun 02, 2003 at 15:37 UTC
    It sounds like what you want to do was discussed in a Linux Magazine Article - Getting a Handle on Traffic - in the Ocotber 2002 issue.

    Basically, the idea is to save all of your log data in a mySQL database which then gives you a bunch of flexibility in reporting.

    From your nick it appears you are a Debian user like myself...if so, check out libapche-dbilogger-perl which purportedly empoweres exactly what you describe.

Re: Website Statistics
by robobunny (Friar) on Jun 02, 2003 at 14:51 UTC
    If you want to have something that is updated immediately, instead of parsing the log files periodically, I'd suggest writing a module for apache (assuming there isn't one already, I wasn't able to locate one doing a quick search). The module itself could buffer updates in memory so it wouldn't kill performance. It seems like the sort of thing that would be useful to other people. There is a mod_log_mysql that might give you the functionality you're looking for if you don't mind running a small database.
Re: Website Statistics
by fglock (Vicar) on Jun 02, 2003 at 14:57 UTC

    A more efficient way to do this is to "rotate" the log (rename the log file), before processing it, and then store the preprocessed results in a file.

    When reporting results, you can read the previous results file, such that you only have to process the small amount of new data that have accumulated since the last analysis.

Re: Website Statistics
by Plato (Friar) on Jun 03, 2003 at 08:26 UTC
    Not to plug a any particular web analysis tool, but I've recently installed an app called awstats... found it easy to set up and it gives me a huge amount of information. I have to say I really like it, plus its free! It's distributed under the GNU General License (GPL). Its available in a few places, sourceforge for example. It works from the command line but also as a CGI. Oh, and its also written in Perl.

    Plato