Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

How much time do you spend at Perlmonks? (personal web proxy)

by gregorovius (Friar)
on Oct 15, 2000 at 11:01 UTC ( [id://36795]=CUFP: print w/replies, xml ) Need Help??

This is a proxy server that will record all your web browsing activity. It's based on cool NetServer::Generic, which lets you focus on the core of your server app, and leads to compact servers. Performancewise it may have a slight effect on the response time of your browser, but it's lightweight enough not to be noticed if you have a fast machine. It also takes advantage of Proc::Daemon to do all those things decent daemons should, like detaching themselves from controlling terminals.

Usage: First configure it by changing the variables on top to suit your browsing habits and bandwidth. $pageview_range should be a little longer than the average time it takes your browser to issue requests for all component files of a page view. $per_page_time is the average time you spend on a page, for the program to give you a simple approximation of the time you spend on the web. $listen_port is the port you want your proxy to listen on and $logfile should be the path to the logfile where your web browsing activity is to be recorded.

If you run this proxy on your own machine you should configure your browser to use a proxy on localhost and the port you configured the program with.

After doing this just continue your happy browsing and when you're curious about how much time you spend on the web (and on any particular site you visit) just go to the http://stats url and you'll get a nice report from the proxy.

It works under Linux, and may also run on NT. This is my first cut at it and I haven't tested it exhaustively. Please provide any comments you may have on functionality or style.

Warning:This program could be a very nasty thing to use on coworkers, but may be of great help in monitoring a child's use of the web.

Update: I found out that my program will fill the process table with zombie processes. Looks like a bug in NetServer::Generic, since the ones I fork are properly ignored by the parent and do not remain.

Update: Wow.. It turns out that NetServer::Generic indeed has a bug. If you want to fix it yourself then go to the source (file Generic.pm) and replace all lines $SIG{CHLD} = &reap_child(); for $SIG{CHLD} = \&reap_child;. I'm using v1.02, which is the most recent. I'll submit a patch to the author.

#!/usr/bin/perl -w use strict; use NetServer::Generic; use Proc::Daemon; my $listen_port = 8080; my $logfile = '/tmp/proxy_log'; my $pageview_range = 20; # seconds my $per_page_time = 2; # minutes my $server_cb = sub { my ($s) = shift ; my $line1 = <STDIN>; unless($line1 =~ m[(\w+)\s+http://([^/:]+)(:(\d+))?(\S*)\s+(\S+)]) { print STDOUT "HTTP/1.0 400 Bad Request\nConnection: close\n\n"; print STDOUT "HTTP/1.0 400 Bad Request\n"; return; } my ($method, $serv, $port, $path, $version) = ($1, $2, $4, $5, $6); if($serv !~ /stats/) { my $sock = IO::Socket::INET->new(PeerAddr => $serv, PeerPort => $port || 80, Proto => 'tcp'); print $sock "$method $path $version\n"; print $sock "Connection: close\n"; $SIG{CHLD} = 'IGNORE'; if(my $pid = fork) { while(<STDIN>) { print $sock $_; } } else { while(<$sock>){ print STDOUT $_; } } } else { my $stats = &getStats(); print STDOUT "HTTP/1.1 200 OK\nContent-type: text/plain\n"; print STDOUT "Connection: close\n\n"; print STDOUT "Your Browsing Stats!\n\n"; print STDOUT "$stats->{DAY} page views in the last day\n"; print STDOUT "$stats->{WEEK} page views in the last week\n"; print STDOUT "$stats->{MONTH} page views in the last month\n"; print STDOUT "$stats->{YEAR} page views in the last year\n\n"; my $avg_time = ($stats->{MONTH} / 30) * $per_page_time; print STDOUT "At $per_page_time minutes per page that's $avg_time +minutes per day in the last month.\n\n"; print STDOUT "Your favorite sites:\n\n"; foreach(map {$_->[0]} sort{$b->[1] <=> $a->[1]} map{[$_, $stats->{BY_SERVER}{$_}{TOTAL}]} (keys %{$stats->{BY_SERVER}})) { print STDOUT "$_\n"; print STDOUT "--------------------------------------------\n"; print STDOUT $stats->{BY_SERVER}{$_}{DAY} || 0, " page views in the last day\n"; print STDOUT $stats->{BY_SERVER}{$_}{WEEK} || 0, " page views in the last week\n"; print STDOUT $stats->{BY_SERVER}{$_}{MONTH} || 0, " page views in the last month\n"; print STDOUT $stats->{BY_SERVER}{$_}{YEAR} || 0, " page views in the last year\n\n"; } } open LOG, ">>$logfile" or die "could not open $logfile"; print LOG $serv, ' ', time, "\n"; close LOG; }; my ($foo) = new NetServer::Generic; $foo->port($listen_port); $foo->callback($server_cb); $foo->mode('forking'); print "Starting server\n"; &Proc::Daemon::Init(); $foo->run(); sub getStats { my $day_ago = time - 60 * 60 * 24; my $week_ago = time - 60 * 60 * 24 * 7; my $month_ago = time - 60 * 60 * 24 * 7 * 30; my $year_ago = time - 60 * 60 * 24 * 7 * 30 * 12; my($serv, $time, %hits); open LOG, "$logfile" or die "could not open $logfile"; while(<LOG>) { ($serv, $time) = split; if($time - $hits{BY_SERVER}{$serv}{LAST} > $pageview_range) { $hits{BY_SERVER}{$serv}{HITS}{$time} = 1; $hits{BY_SERVER}{$serv}{LAST} = $time; $hits{BY_SERVER}{$serv}{TOTAL}++; if($day_ago < $time) { $hits{BY_SERVER}{$serv}{DAY}++; $hits{BY_SERVER}{$serv}{WEEK}++; $hits{BY_SERVER}{$serv}{MONTH}++; $hits{BY_SERVER}{$serv}{YEAR}++; $hits{DAY}++; $hits{WEEK}++; $hits{MONTH}++; $hits{YEAR}++; } elsif ($week_ago < $time) { $hits{BY_SERVER}{$serv}{WEEK}++; $hits{BY_SERVER}{$serv}{MONTH}++; $hits{BY_SERVER}{$serv}{YEAR}++; $hits{WEEK}++; $hits{MONTH}++; $hits{YEAR}++; } elsif ($month_ago < $time) { $hits{BY_SERVER}{$serv}{MONTH}++; $hits{BY_SERVER}{$serv}{YEAR}++; $hits{MONTH}++; $hits{YEAR}++; } elsif ($year_ago < $time) { $hits{BY_SERVER}{$serv}{YEAR}++; $hits{YEAR}++; } } } close LOG; \%hits; }

Replies are listed 'Best First'.
RE: How much time do you spend at Perlmonks? (personal web proxy)
by elwarren (Priest) on Oct 23, 2000 at 23:26 UTC
    Very cool program. I just tried it on my NT workstation and it does not work. Seems that ActiveState perl does not have the NetServer::Generic module. Probably because of forking?

    Does anybody know if it's possible to download a ppm of this module for NT?
      NetServer::Generic is pure perl, so you can install it by hand if you download it from CPAN. To do this just download from CPAN, create a NetServer folder in your site library directory (usually C:\Perl\site\lib) and drop the Generic.pm package there. I have used this module under NT in the past but I haven't tested this program on it. If it doesn't work then you may be able to rewrite the program to do without NetServer::Generic, as fork (a pseudofork) is implemented on Perl for Win32.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://36795]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-20 16:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found