This is a proxy server that will record all your web browsing
activity. It's based on cool
NetServer::Generic, which lets
you focus on the core of your server app, and leads to compact
servers. Performancewise it may have a slight effect
on the response time of your browser, but it's lightweight
enough not to be noticed if you have a fast machine. It also
takes advantage of
Proc::Daemon to do all those things decent
daemons should, like detaching themselves from controlling
terminals.
Usage: First configure it by changing the variables
on top to suit your browsing habits and bandwidth.
$pageview_range should be a little longer than the
average time it takes your browser to issue requests for all
component files of a page view. $per_page_time is
the average time you spend on a page, for the program to
give you a simple approximation of the time you spend on
the web. $listen_port is the port you want your
proxy to listen on and $logfile should be the path
to the logfile where your web browsing activity is to be
recorded.
If you run this proxy on your own machine you
should configure your browser to use a proxy on
localhost and the port you configured the program
with.
After
doing this just continue your happy browsing and when
you're curious about how much time you spend on the web
(and on any particular site you visit) just go to the
http://stats url and you'll get a nice report
from the proxy.
It works under Linux, and
may also run on NT. This is my first cut at it and
I haven't tested it exhaustively. Please provide any
comments you may have on functionality or style.
Warning:This program could be a very nasty thing
to use on coworkers, but may be of great help in monitoring
a child's use of the web.
Update: I found out that my program will fill the
process table with zombie processes. Looks like a bug in
NetServer::Generic, since the ones I fork are
properly ignored by the parent and do not remain.
Update: Wow.. It turns out that NetServer::Generic
indeed has a bug. If you want to fix it yourself then go to
the source (file Generic.pm) and replace all lines
$SIG{CHLD} = &reap_child(); for
$SIG{CHLD} = \&reap_child;.
I'm using v1.02, which is the most recent. I'll submit
a patch to the author.
#!/usr/bin/perl -w
use strict;
use NetServer::Generic;
use Proc::Daemon;
my $listen_port = 8080;
my $logfile = '/tmp/proxy_log';
my $pageview_range = 20; # seconds
my $per_page_time = 2; # minutes
my $server_cb = sub {
my ($s) = shift ;
my $line1 = <STDIN>;
unless($line1 =~ m[(\w+)\s+http://([^/:]+)(:(\d+))?(\S*)\s+(\S+)]) {
print STDOUT "HTTP/1.0 400 Bad Request\nConnection: close\n\n";
print STDOUT "HTTP/1.0 400 Bad Request\n";
return;
}
my ($method, $serv, $port, $path, $version) = ($1, $2, $4, $5, $6);
if($serv !~ /stats/) {
my $sock = IO::Socket::INET->new(PeerAddr => $serv,
PeerPort => $port || 80,
Proto => 'tcp');
print $sock "$method $path $version\n";
print $sock "Connection: close\n";
$SIG{CHLD} = 'IGNORE';
if(my $pid = fork) {
while(<STDIN>) {
print $sock $_;
}
} else {
while(<$sock>){
print STDOUT $_;
}
}
} else {
my $stats = &getStats();
print STDOUT "HTTP/1.1 200 OK\nContent-type: text/plain\n";
print STDOUT "Connection: close\n\n";
print STDOUT "Your Browsing Stats!\n\n";
print STDOUT "$stats->{DAY} page views in the last day\n";
print STDOUT "$stats->{WEEK} page views in the last week\n";
print STDOUT "$stats->{MONTH} page views in the last month\n";
print STDOUT "$stats->{YEAR} page views in the last year\n\n";
my $avg_time = ($stats->{MONTH} / 30) * $per_page_time;
print STDOUT "At $per_page_time minutes per page that's $avg_time
+minutes per day in the last month.\n\n";
print STDOUT "Your favorite sites:\n\n";
foreach(map {$_->[0]}
sort{$b->[1] <=> $a->[1]}
map{[$_, $stats->{BY_SERVER}{$_}{TOTAL}]}
(keys %{$stats->{BY_SERVER}})) {
print STDOUT "$_\n";
print STDOUT "--------------------------------------------\n";
print STDOUT $stats->{BY_SERVER}{$_}{DAY} || 0,
" page views in the last day\n";
print STDOUT $stats->{BY_SERVER}{$_}{WEEK} || 0,
" page views in the last week\n";
print STDOUT $stats->{BY_SERVER}{$_}{MONTH} || 0,
" page views in the last month\n";
print STDOUT $stats->{BY_SERVER}{$_}{YEAR} || 0,
" page views in the last year\n\n";
}
}
open LOG, ">>$logfile" or die "could not open $logfile";
print LOG $serv, ' ', time, "\n";
close LOG;
};
my ($foo) = new NetServer::Generic;
$foo->port($listen_port);
$foo->callback($server_cb);
$foo->mode('forking');
print "Starting server\n";
&Proc::Daemon::Init();
$foo->run();
sub getStats {
my $day_ago = time - 60 * 60 * 24;
my $week_ago = time - 60 * 60 * 24 * 7;
my $month_ago = time - 60 * 60 * 24 * 7 * 30;
my $year_ago = time - 60 * 60 * 24 * 7 * 30 * 12;
my($serv, $time, %hits);
open LOG, "$logfile" or die "could not open $logfile";
while(<LOG>) {
($serv, $time) = split;
if($time - $hits{BY_SERVER}{$serv}{LAST} > $pageview_range) {
$hits{BY_SERVER}{$serv}{HITS}{$time} = 1;
$hits{BY_SERVER}{$serv}{LAST} = $time;
$hits{BY_SERVER}{$serv}{TOTAL}++;
if($day_ago < $time) {
$hits{BY_SERVER}{$serv}{DAY}++;
$hits{BY_SERVER}{$serv}{WEEK}++;
$hits{BY_SERVER}{$serv}{MONTH}++;
$hits{BY_SERVER}{$serv}{YEAR}++;
$hits{DAY}++;
$hits{WEEK}++;
$hits{MONTH}++;
$hits{YEAR}++;
} elsif ($week_ago < $time) {
$hits{BY_SERVER}{$serv}{WEEK}++;
$hits{BY_SERVER}{$serv}{MONTH}++;
$hits{BY_SERVER}{$serv}{YEAR}++;
$hits{WEEK}++;
$hits{MONTH}++;
$hits{YEAR}++;
} elsif ($month_ago < $time) {
$hits{BY_SERVER}{$serv}{MONTH}++;
$hits{BY_SERVER}{$serv}{YEAR}++;
$hits{MONTH}++;
$hits{YEAR}++;
} elsif ($year_ago < $time) {
$hits{BY_SERVER}{$serv}{YEAR}++;
$hits{YEAR}++;
}
}
}
close LOG;
\%hits;
}