Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

apache log splitter

by fuzzysteve (Beadle)
on Nov 16, 2001 at 19:15 UTC ( [id://125827]=CUFP: print w/replies, xml ) Need Help??

takes an apache log, and splits it up into a number of logfiles. One for each day traffic took place on. (eventually going to rework it to use date::manip and output the week number as well.
#!/usr/bin/perl -w use strict; ## Just a small file to split apache logs up into days (Should work fo +r any log that has its date and time in the same format as apache [dd +/mm/yy:rest of st amp] ## Nothing particualrly fancy. ## Date extraction from apache log files. combined format. sub get_date_from_log_line{ my %date; my $line = shift; my $dateline=$1 if ($line=~ m/(\[.+?\])/); my @datestring=split(/:/,$dateline); substr($datestring[0],0,1)=""; return $datestring[0]; } ## Basic Variable setups. ## my $pathname=shift(@ARGV) or die("Two arguments please: log file to be + split, and where to put the split files"); my $final_directory=shift(@ARGV) or die("Two arguments please: log fil +e to be split, and where to put the split files"); my $date; my $date_last; my $line; ## /Variables ## open(FILE1,"$pathname") or die("bugger $pathname\n"); $line=<FILE1>; $date=&get_date_from_log_line($line); my $timeStamp=$date; $timeStamp =~ s/\///g; my $outputfile="$final_directory$timeStamp.log"; open (OUTFILE,">$outputfile"); print OUTFILE $line; until (eof(FILE1)) { $line=<FILE1>; $date_last=$date; $date=&get_date_from_log_line($line); if ($date_last ne $date){ close (OUTFILE); $timeStamp=$date; $timeStamp =~ s/\///g; $outputfile="$final_directory$timeStamp.full.log"; open (OUTFILE,">$outputfile") or die("damn it to hell $!\n$outputfile\ +n"); } print OUTFILE $line; } close(OUTFILE); print "Files Split.\n";

Update- revised and removed the .*

Replies are listed 'Best First'.
Re: apache log splitter (bug)
by humanclock (Initiate) on Oct 03, 2009 at 06:15 UTC

    This code assumes that the logfile is in ascending order, which is not always the case at midnight on higher traffic websites. A line or two with the previous day's timestamp can still show up in the logfile during the first minute of the new day.

    Hence, since this script creates a new logfile rather than appending to an existing one....thus one data line out of order in the logfile will destroy what was already written for that entire day.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://125827]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-16 14:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found