enoch has asked for the wisdom of the Perl Monks concerning the following question:
I am writing a quick script that will parse over my Apache access log and print out the most recent file accesses. The gotcha comes in with the fact that I want it to only print out the most recent file accessed from each unique IP, and I want the output sorted by date.
Thanks,
enoch
P.S. You gotta love those 'default.ida?NNNNNNN' entries.
Here is my code:
That code produces the following output:#!/usr/bin/perl use warnings; use strict; use Date::Manip; use vars qw(%ipHash); while(<DATA>) { / ^((\d{1,3}\.){3}\d{1,3}) # grab the IP address into $1 \s\-\s\-\s\[ (\d\d\/\w{3}\/\d\d(\d\d\:){3}\d\d) # grab the date into $3 \s\-\d{4}\]\s"\w{1,4}\s ([\/|\w|\.|_]+) # grab the file path into $5 /x and $ipHash{&UnixDate($3,"%s")} = [$1, $3, $5]; } print join "\n", map {$ipHash{$_}[0] . " => " . $ipHash{$_}[1] . "\t" . $ipHash{$_} +[2]} sort keys %ipHash; __DATA__ 209.36.83.252 - - [17/Oct/2002:05:53:17 -0400] "GET /scripts/..%%35%63 +../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 303 209.36.83.252 - - [17/Oct/2002:05:53:17 -0400] "GET /scripts/..%%35c.. +/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 303 209.36.83.252 - - [17/Oct/2002:05:53:17 -0400] "GET /scripts/..%25%35% +63../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 313 209.36.83.252 - - [17/Oct/2002:05:53:17 -0400] "GET /scripts/..%252f.. +/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 313 68.9.44.75 - - [17/Oct/2002:06:50:34 -0400] "GET /phpMyAdmin HTTP/1.1" + 301 322 68.9.44.75 - - [17/Oct/2002:06:50:34 -0400] "GET /phpMyAdmin/ HTTP/1.1 +" 200 898 68.9.44.75 - - [17/Oct/2002:06:50:36 -0400] "GET /phpMyAdmin/left.php? +lang=en-iso-8859-1&convcharset=iso-8859-1&server=1 HTTP/1.1" 200 1024 129.22.39.158 - - [17/Oct/2002:18:05:10 -0400] "OPTIONS / HTTP/1.1" 20 +0 0 160.79.211.121 - - [17/Oct/2002:19:51:31 -0400] "GET /default.ida?NNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNN%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u +6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u007 +8%u0000%u00=a HTTP/1.0" 400 303 129.22.82.8 - - [17/Oct/2002:20:37:10 -0400] "GET /index.php HTTP/1.1" + 200 25430 129.22.82.8 - - [17/Oct/2002:20:37:10 -0400] "GET /index.php?=PHPE9568 +F35-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 200 4440 129.22.82.8 - - [17/Oct/2002:20:37:10 -0400] "GET /index.php?=PHPE9568 +F34-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 200 2962 129.22.82.8 - - [17/Oct/2002:21:25:44 -0400] "GET / HTTP/1.1" 200 2673 129.22.82.8 - - [17/Oct/2002:21:25:44 -0400] "GET /manual/images/apach +e_pb.gif HTTP/1.1" 404 302
Now, ideally, I would not want the IP's repeated. Rather, I just want to see the last file accessed by that IP. So, the output would look like:209.36.83.252 => 17/Oct/2002:05:53:17 /scripts/.. 68.9.44.75 => 17/Oct/2002:06:50:34 /phpMyAdmin/ 68.9.44.75 => 17/Oct/2002:06:50:36 /phpMyAdmin/left.php 160.79.211.121 => 17/Oct/2002:19:51:31 /default.ida 129.22.82.8 => 17/Oct/2002:20:37:10 /index.php 129.22.82.8 => 17/Oct/2002:21:25:44 /manual/images/apache_pb.gif
But, to maintain my sorting by date, I key the hash by the Unix timestamp and not the IP. Would I need to set up a dualing hash thing so I can sort by date but keep only one entry for each IP address? I just can't seem to wrap my head around this and wondered if any monks had some nifty ideas.209.36.83.252 => 17/Oct/2002:05:53:17 /scripts/.. 68.9.44.75 => 17/Oct/2002:06:50:34 /phpMyAdmin/ 160.79.211.121 => 17/Oct/2002:19:51:31 /default.ida 129.22.82.8 => 17/Oct/2002:21:25:44 /manual/images/apache_pb.gif
Thanks,
enoch
P.S. You gotta love those 'default.ida?NNNNNNN' entries.
Back to
Seekers of Perl Wisdom