Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Parsing LDAP log file given a time period

by spartan (Pilgrim)
on Oct 20, 2006 at 16:34 UTC ( [id://579636]=perlquestion: print w/replies, xml ) Need Help??

spartan has asked for the wisdom of the Perl Monks concerning the following question:

Good day all, I'm looking for another way to skin a particular cat. Let me preface this with the fact that I am in no way shape or form a highly skilled programmer. I write scripts at best, but they always seem to get done what I need. I'll start by explaining my problem.

I write small tools at times to help me do analysis of log files, and the like, today's problem leads me to the following: I'm parsing an LDAP log file for BIND's, and UNBINDS. The following code takes data like the this:

  • up27 ,Oct 17 00:00:09, 2605039,BIND
  • up27 ,Oct 17 00:00:09, 2605039,UNBIND
  • up27 ,Oct 17 00:00:09, 2605040,BIND
  • up27 ,Oct 17 00:00:09, 2605040,UNBIND
  • ....
  • up27 ,Oct 17 23:59:09, 2615039,BIND
  • up27 ,Oct 17 23:59:09, 2615039,UNBIND
  • up27 ,Oct 17 23:59:09, 2615040,BIND
  • up27 ,Oct 17 23:59:09, 2615040,UNBIND
and spits out data like this:
BINDS UNBINDS Hour 00 has 1433 BINDS and 1423 UNBINDS TOTAL=2856 Hour 01 has 1501 BINDS and 1502 UNBINDS TOTAL=3003 Hour 02 has 1278 BINDS and 1279 UNBINDS TOTAL=2557 Hour 03 has 1269 BINDS and 1262 UNBINDS TOTAL=2531 Hour 04 has 637 BINDS and 629 UNBINDS TOTAL=1266 Hour 05 has 327 BINDS and 323 UNBINDS TOTAL=650 Hour 06 has 363 BINDS and 354 UNBINDS TOTAL=717 Hour 07 has 1497 BINDS and 1478 UNBINDS TOTAL=2975 Hour 08 has 3389 BINDS and 3354 UNBINDS TOTAL=6743 Hour 09 has 14671 BINDS and 14646 UNBINDS TOTAL=29317 Hour 10 has 4215 BINDS and 4146 UNBINDS TOTAL=8361 Hour 11 has 3254 BINDS and 3210 UNBINDS TOTAL=6464 Hour 12 has 2795 BINDS and 2757 UNBINDS TOTAL=5552 Hour 13 has 2553 BINDS and 2517 UNBINDS TOTAL=5070 Hour 14 has 2592 BINDS and 2521 UNBINDS TOTAL=5113 Hour 15 has 6258 BINDS and 6229 UNBINDS TOTAL=12487 Hour 16 has 2416 BINDS and 2384 UNBINDS TOTAL=4800 Hour 17 has 2315 BINDS and 2263 UNBINDS TOTAL=4578 Hour 18 has 1838 BINDS and 1819 UNBINDS TOTAL=3657 Hour 19 has 1972 BINDS and 1923 UNBINDS TOTAL=3895 Hour 20 has 1672 BINDS and 1662 UNBINDS TOTAL=3334 Hour 21 has 1540 BINDS and 1501 UNBINDS TOTAL=3041 Hour 22 has 1367 BINDS and 1346 UNBINDS TOTAL=2713 Hour 23 has 1348 BINDS and 1321 UNBINDS TOTAL=2669
Here is the code that accomplishes this:
#!/usr/local/bin/perl -w use strict; my $count=00; my $processed_log=$ARGV[0]; # Now that we have the logs, let's give a breakdown of BIND/UNBINDS # per hour. To start, I'll make it easy and just do them on an hourly # basis (eg. 09:00-09:59, 10:00-10:59, etc...) open PROCESSED_LOG, "<$processed_log" or die "Cannot open $processed_l +og: $!\n"; my @LOGFILE=<PROCESSED_LOG>; print"BINDS UNBINDS\n"; for ($count=00; $count<24; ++$count) { $count=sprintf("%02d",$count); my $bind=0; my $unbind=0; # I know I'm looping over and over this array. I have to # because the parsed log file does not begin or end on # hour 00:00:00. It actually starts at about 02:14:00. # Why they chose to roll the log file at 2:15 am, I'll # never know, but there you have it. foreach (@LOGFILE) { if ($_=~/$count:\d\d:\d\d/) { if ($_=~/UNBIND/) { ++$unbind; } elsif ($_=~/BIND/) { ++$bind; } } } my $total=$bind+$unbind; printf ("Hour $count has %5d BINDS and %5d UNBINDS TOTAL=%-6d\n",$b +ind,$unbind,$total); # print"BINDS UNBINDS\n"; # printf ("%-5d %5d\n",$bind,$unbind,$total); }
As you can see it simply spits out BINDS, UNBINDS, and the total per hour. What I would like is to be able to pass in a variable that will put the output into chunks of arbitrary time slices. So if I say parse_file.pl --interval=300 (I'll keep it in seconds for now to keep the problem domain simple) it would spit out something like this:
Hour 00:00:00 has 1433 BINDS and 1423 UNBINDS TOTAL=2856 Hour 00:05:00 has 1501 BINDS and 1502 UNBINDS TOTAL=3003 Hour 00:10:00 has 1278 BINDS and 1279 UNBINDS TOTAL=2557 Hour 00:15:00 has 1269 BINDS and 1262 UNBINDS TOTAL=2531
My initial thoughts are to use the Date::Manip module (it is my favorite module for monkeying around with minutes, seconds, hours, days, etc...) to convert each of the time stamps into unix time, and see if it falls into the range given by the --interval= argument. I know I need a bit more math in there for specifying unix time for Oct 17 00:00:00, and then adding 300 seconds (or is it 299, details, details) and see if each falls in between the range of 00:00:00, and 00:04:59. If so, aggregate those, then move on.

I may very well end up doing it this way, but I'm always interested in learning new ways to do it (TIMTOWTDI after all). I'll end this by re-stating that I am in no way a genius, so if anyone decides to respond with some form of black magic, super kung-fu solution, it will need LOTS of explaining if there's any hope of me understanding it.

I hold you all in great esteem, and have lurked here for quite a while learning LOTS and LOTS. So let me say thank you all for knowledge that I would never get in my day to day job.

Very funny Scotty... Now PLEASE beam down my PANTS!

Replies are listed 'Best First'.
Re: Parsing LDAP log file given a time period
by jwkrahn (Abbot) on Oct 20, 2006 at 17:38 UTC
    You may be better off using a hash for your code:
    #!/usr/local/bin/perl -w use strict; my $processed_log = $ARGV[ 0 ]; # Now that we have the logs, let's give a breakdown of BIND/UNBINDS # per hour. To start, I'll make it easy and just do them on an hourly # basis (eg. 09:00-09:59, 10:00-10:59, etc...) open PROCESSED_LOG, '<', $processed_log or die "Cannot open $processed +_log: $!\n"; print"BINDS UNBINDS\n"; my %data; while ( <PROCESSED_LOG> ) { if ( /(\d\d):\d\d:\d\d.+?(UNBIND|BIND)/ ) { $data{ $1 }{ $2 }++; } } for my $hour ( sort keys %data ) { printf "Hour %s has %5d BINDS and %5d UNBINDS TOTAL=%-6d\n", $hou +r, @{ $data{ $hour } }{ qw/BIND UNBIND/ }, $data{ $hour }{ BIND } + $ +data{ $hour }{ UNBIND }; }
    If you wanted to do it in five minute intervals then:
    #!/usr/local/bin/perl -w use strict; my $processed_log = $ARGV[ 0 ]; my $minute_interval = 5; # or use command line or getopts() # Now that we have the logs, let's give a breakdown of BIND/UNBINDS # per hour. To start, I'll make it easy and just do them on an hourly # basis (eg. 09:00-09:59, 10:00-10:59, etc...) open PROCESSED_LOG, '<', $processed_log or die "Cannot open $processed +_log: $!\n"; print"BINDS UNBINDS\n"; my %data; while ( <PROCESSED_LOG> ) { if ( /(\d\d):(\d\d):\d\d.+?(UNBIND|BIND)/ ) { my $time = sprintf '%02d:%02d', $1, int( $2 / $minute_interval + ) * $minute_interval; $data{ $time }{ $3 }++; } } for my $hour ( sort keys %data ) { printf "Hour %s has %5d BINDS and %5d UNBINDS TOTAL=%-6d\n", $hou +r, @{ $data{ $hour } }{ qw/BIND UNBIND/ }, $data{ $hour }{ BIND } + $ +data{ $hour }{ UNBIND }; }
      Uhmm, wow. That was brilliant. No magic here, but magnificent mathematics at work. Taking the minutes and dividing by the requested interval, and then multiplying by the interval yields the requested interval (int was instrumental, I see this now). and as for using a hash to count the number of BINDS/UNBINDS per interval is inspiring to me.

      I've often tried to incorporate hashes into programs to take advantage of it's inherent nature to group data (am I saying that right? *shrug*), but I do not use that often enough to recognize problems like this, that a hash makes easier to deal with.

      What is that skill called? I want to say data-something or other.

      Thank you for the quick reply, it works beautifully.

      Now, that's not to say I've let the rest of you off the hook, I'd still like to see how you might have tackled this problem. I'll add the above to my repertoire of code that I routinely search through to help me conquer these types of problems.

      Very funny Scotty... Now PLEASE beam down my PANTS!

        I've often tried to incorporate hashes into programs to take advantage of it's inherent nature to group data (am I saying that right? *shrug*), but I do not use that often enough to recognize problems like this, that a hash makes easier to deal with.

        What is that skill called? I want to say data-something or other.

        Data structures, perhaps? Knowing how to store your data so that it is easy to manipulate for your intended purpose is, indeed, an important programming skill — in any programming language.

        Ask yourself, <q>What do I need to be able to do with these data?</q> Let the answer drive your data structures.


        Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.
Re: Parsing LDAP log file given a time period
by andyford (Curate) on Oct 20, 2006 at 16:56 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://579636]
Approved by ww
Front-paged by andyford
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-18 04:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found