Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Parsing logs and bookmarking last line parsed

by JaeDre619 (Acolyte)
on Aug 19, 2010 at 03:41 UTC ( [id://855951]=perlquestion: print w/replies, xml ) Need Help??

JaeDre619 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am definitely a newbie and perl is my first programming language! In the last few weeks, I've managed to learn to write some basic code to parse a log file and extract some needed data. Now after using it for a few days, I'm finding that I do need to rework this to get a complete data set and need your help in reworking and/or adding to this.

Code synopsis: Analyze logfile based on a date. For the given date, read each line in the log and extract data for each backup set and gather particular name(attribute)=value pairs.

My issue: I need to alter/change this logic to read not on the given date, but on the last time I read the log entry. I have another shell script that pulls in a copy of the logfile to be read. Each time I copy could have new records added to it. So, I need to somehow bookmark this. This is where I need help coding this piece.

For example, if the backup (backup.set2_lvm) started on the night before (Aug 15 20:00 and continues to Aug 16 00:33), I would not capture (Aug 15 20:00) entry since my script is only set to read the data for the 16th.

For example, my raw logfile/input looks something like this:
Sun Aug 15 20:00:03 2010: backup.set2_lvm:backup:INFO: START OF BACKUP Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-set=back +up.set2_lvm Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-date=201 +00815200003 Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-type=reg +ular Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-date-epo +ch=1281927603 Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-director +y=/home/backups/backup.set2_lvm/20100815200003 Mon Aug 16 00:00:04 2010: backup.set1_lvm:backup:INFO: START OF BACKUP Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-set=back +up.set1_lvm Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-date=201 +00816000003 Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-type=reg +ular Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-date-epo +ch=1281942003 Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: last-backup +=/home/backups/backup.set2_lvm_lvm/20100814200003 Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-size +=424.53 GB Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-time +=04:33:12 Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-stat +us=Backup succeeded Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: Backup succ +eeded Mon Aug 16 00:33:16 2010: backup.set2_lvm_lvm:backup:INFO: END OF BACK +UP Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: last-backup=/ho +me/backups/backup.set1_lvm/20100815000006 Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-size=187 +.24 GB Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-time=01: +59:04 Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-status=B +ackup succeeded Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: Backup succeede +d Mon Aug 16 01:59:09 2010: backup.set1_lvm:backup:INFO: END OF BACKUP

Basically, I would like to keep track of the events per given "backup.set#". So next time the script is initiated it will look at the next new set of events for that given set.

I hope that makes sense on what I am looking for. Please provide any good examples of doing this if you have any. Thanks.

My code:
use strict; use warnings; use File::Basename; use Data::Dumper; my ($ServerName)=@ARGV; #ARGV = /var/log/server1.mydomain.com.backup-s +oftware.log my %MyItems; my $mon; my $day; my $year; foreach my $ServerName(@ARGV){ while (my $line = <>){ chomp $line; print "Line: $line\n" if debug; sub spGetCurrentDateTime; ($mon, $day, $year) = spGetCurrentDateTime; ($mon, $day, $year) = split(" ", $mon); if ($line =~ m/(.* $mon $day) \d{2}:\d{2}:\d{2} $year: ([^:]+):bac +kup:/){ my $ServerName = basename $ARGV, '.mydomain.com.backup-software. +log'; my $BckupDate="$1 $year"; my $BckupSet =$2; $MyItems{$ServerName}{$BckupSet}->{'1-Server'} = $ServerName; $MyItems{$ServerName}{$BckupSet}->{'2-Logdate'} = $BckupDate; $MyItems{$ServerName}{$BckupSet}->{'3-BackupSet'} = $BckupSet; if ($line =~ m/.* \w+ \d{2} (\d{2}:\d{2}:\d{2}) \d{4}: ([^:]+):b +ackup:.*(START OF BACKUP)/){ my $BckupKey=$2; my $BckupVal=$1; $MyItems{$ServerName}{$BckupSet}->{'4-StartTime'} = $BckupVa +l; } if ($line =~ m/(backup-time)[:=](.+)/){ my $BckupKey="5-Duration"; my $BckupVal=$2; $MyItems{$ServerName}{$BckupSet}->{$BckupKey} = $BckupVal; } if ($line =~ m/(backup-size)[:=](.+)/){ my $BckupKey="6-Size"; #my $BckupKey=$1; my $BckupVal=$2; $MyItems{$ServerName}{$BckupSet}->{$BckupKey} = $BckupVal; } if ($line =~ m/(Backup succeeded)/){ my $BckupKey="7-Status"; my $BckupVal="Succeeded"; $MyItems{$ServerName}{$BckupSet}->{$BckupKey} = $BckupVal; } if ($line =~ m/(ERROR)[:=](.+)/){ my $BckupKey="8-Status"; my $BckupVal="Unsuccessful"; $MyItems{$ServerName}{$BckupSet}->{$BckupKey} = $BckupVal; } } } sub spGetCurrentDateTime{ my ($sec, $min, $hour, $mday, $mon, $year) = localtime(); my @abbr = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ); my $currentDateTime = sprintf "%s %02d %4d", $abbr[$mon], $mday, $ +year+1900; #Returns => 'Jul 26 2010' return $currentDateTime; } #print Dumper(\%MyItems); for my $ServerName(keys%MyItems){ for my $BckupSet(keys%{$MyItems{$ServerName}}){ for(sort keys%{$MyItems{$ServerName}{$BckupSet}}){ print$_,'=',$MyItems{$ServerName}{$BckupSet}{$_},';'; } print"\n"; } } }
I appreciate the help!

Replies are listed 'Best First'.
Re: Parsing logs and bookmarking last line parsed
by moritz (Cardinal) on Aug 19, 2010 at 06:50 UTC

    You can obtain the current position in a file with tell, and later seek to the same position.

    Note that this works only if data is only appended, and old data is never deleted, or changed in length.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Parsing logs and bookmarking last line parsed
by murugu (Curate) on Aug 19, 2010 at 04:12 UTC
    JaeDre619,

    One way of doing it is by writing the last time stamp in the logfile in to a text file during end of the script execution. Next time you run the script, use time stamp inside the text file to skip through the unwanted entries inside the for loop.

    There must be better way than what i suggested, hang on in this site you will get better solutions from best monks here.

    Regards,
    Murugesan Kandasamy
    use perl for(;;);

      That's kind of what I was thinking too. How would I skip the lines already read and read the new lines? Is there good perl module for this?
        I missed this comment before posting my reply below. It would help if you could explain what you intend to use this "last processing occurred at XYZ date/time" for?

        I'm actually not sure that you need this concept at all. If you just need the last data for each backup set, then I would process the input file, replacing old info with new as it becomes available. Then the output becomes "hey here is the most recent stuff I have". All of this processing will be so fast that there is no need to keep track of what you did before, just do it all again to keep things simple. I mean there are 86,400 seconds in a day and running a program once per day that takes one second is nothing in the scheme of things!

        The problem I came into was that the data for each backup set doesn't appear to be "symmetric". In other words, sometimes some parm values are "missing". This can cause some previous value to continue to be "carried forward" when that is not the right thing to do.

        Rather than getting into some "spec war", I post a simple minded use of my previously posted code to report "last values" of each set and then you can tell me: "Hey this would have been right if it had of done X". Below I didn't use $date, don't know why you need $date.

        In doing this short thing, I noticed that $param could have a leading space, so I changed a regex.

        use strict; use Data::Dumper; my %backups; while (<DATA>) { next if (/^\s*$/); #skip blank lines chomp; my ($date, $backupset , $parm , $value) = parseline($_); if ($value) { $backups{$backupset}{$parm} = $value; } } print Dumper \%backups; sub parseline { my $line = shift; my ($date, $rest) = $line =~ m/(^.*\d{4}):(.*)/; my ($backupset, $msg) = split(/backup:INFO:/, $rest); $backupset =~ s/:\s*$//; #trim some unwanted thing like ':' is ok $backupset =~ s/^\s*backup\.//; #more than one step is just fine! my ($parm, $value) = $msg =~ m/\s*(.*)=\s*(.*)\s*/; $parm ||= $msg; #if match doesn't happen these will be undef $value ||=""; #this trick makes sure that they are defined. return ($date, $backupset, $parm, $value); } =print #some reformatting to try to stop line wrap.... $VAR1 = { 'set1_lvm' => { 'backup-size' => '187.24 GB', 'backup-set' => 'backup.set1_lvm', 'backup-time' => '01:59:04', 'backup-date-epoch' => '1281942003', 'backup-status' => 'Backup succeeded', 'last-backup' => '/home/backups/backup.set1_lvm/20100815000006', 'backup-type' => 'regular', 'backup-date' => '20100816000003' }, 'set2_lvm_lvm' => { 'backup-size' => '424.53 GB', 'backup-time' => '04:33:12', 'backup-status' => 'Backup succeeded', 'last-backup' => '/home/backups/backup.set2_lvm_lvm/20100814200003' }, 'set2_lvm' => { 'backup-directory' => '/home/backups/backup.set2_lvm/20100815200003' +, 'backup-set' => 'backup.set2_lvm', 'backup-date-epoch' => '1281927603', 'backup-type' => 'regular', 'backup-date' => '20100815200003' } }; =cut __DATA__ Sun Aug 15 20:00:03 2010: backup.set2_lvm:backup:INFO: START OF BACKUP Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-set=back +up.set2_lvm Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-date=201 +00815200003 ..... use __DATA__ segment from my previous post

        If you're ok with the idea of recording to a file the last time stamp that was used, it should be pretty simple to record which line you were last on too. You'll just need to modify your while loop a bit by adding a variable to keep track of the line numbers.

        For a simple illustration, let's say that you read in from your new assistant file the last time stamp and the last line number read. Let's say that the last line read was stored in the variable $last_line_read. The code below illustrates the modification that you would need to do.

        my $line_count = 0; while (my $line = <>){ $line_count++; next if ($line_count <= $last_line_read); # the rest of you code from the while remains the same }

        I'm not saying that this is the "best" way to do it, but it should work.

Re: Parsing logs and bookmarking last line parsed
by Anonymous Monk on Aug 19, 2010 at 04:25 UTC
Re: Parsing logs and bookmarking last line parsed
by toolic (Bishop) on Aug 19, 2010 at 15:30 UTC
    Unrelated to your question, you could simplify your spGetCurrentDateTime sub using POSIX and localtime (both of which are in the core Perl distribution) as follows:
    use POSIX qw(strftime); sub spGetCurrentDateTime { return strftime('%b %d %Y', localtime); }
Re: Parsing logs and bookmarking last line parsed
by Marshall (Canon) on Aug 19, 2010 at 19:43 UTC
    I hope a couple of pointers will help you out..

    First on the date format. If you can and I'm not sure that you can, but using something like: 20100815200003, i.e. "2010-08-15 20:00:03" would be much preferred on the left hand side of your log file instead of "Sun Aug 15 20:00:03 2010" because a simple alpha-numeric comparison or sort can be done on that type of string without converting to epoch time. The leading zeroes are important otherwise a simple sort won't work. If you want to, add the redundant info like "Sun" as a separate field for the humans to read.

    I wasn't exactly able to figure out what you are doing with the data although your data structure might be more complex than necessary. If you want to keep track of where your last processing left off, I would just make a separate file and put that date/time code in it. If you use a time format like above, then you can just simple cmp for less than, equal, greater than. If this extra "bookmark" file isn't there, I would process the whole file and then generate that bookmark file. I would not recommend appending anything to your log file with the "hey, I got here last time info". Whatever the thing is that generates this file, leave it alone and don't mess with its data.

    My inclination would be to concentrate the parsing of the input lines into one sub. I did that below. I wouldn't worry about being fancy, just get the job done. I didn't agonize over "the best way"..I just wanted to show a couple of techniques. Improve the code later if you need to. Performance of this sub will not be an issue, only "correctness" of the parsing. Of course when you have a "pair", that screams hash table. Usually there is no need to modify what I called $param in my code.

    Anyway, let us know how you are getting on. I commend you for tackling a hard problem as a "first assignment".

    #!/usr/bin/perl -w use strict; while (<DATA>) { next if (/^\s*$/); #skip blank lines chomp; my ($date, $backupset , $parm , $value) = parseline($_); # the idea is to concentrate the parsing of the line and its # associated "regex-foo" into one place. I think rest of your # code can use simple eq or ne comparisons. print "$date\n"; print " BACKUP SET = $backupset\n"; if ($value eq "") { print " SINGLE TOKEN: $parm\n";} else {print " PAIR: $parm IS $value\n";} } sub parseline { my $line = shift; my ($date, $rest) = $line =~ m/(^.*\d{4}):(.*)/; my ($backupset, $msg) = split(/backup:INFO:/, $rest); $backupset =~ s/:\s*$//; #trimming some unwanted thing like ':' is + ok $backupset =~ s/^\s*backup\.//; #more than one step is just fine to +o! my ($parm, $value) = $msg =~ m/(.*)=(.*)/; $parm ||= $msg; #if match doesn't happen these will be undef $value ||=""; #so this trick makes sure that they are defined. return ($date, $backupset, $parm, $value); } =prints... Sun Aug 15 20:00:03 2010 BACKUP SET = set2_lvm SINGLE TOKEN: START OF BACKUP Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-set IS backup.set2_lvm Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-date IS 20100815200003 Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-type IS regular Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-date-epoch IS 1281927603 Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-directory IS /home/backups/backup.set2_lvm/201008152 +00003 Mon Aug 16 00:00:04 2010 BACKUP SET = set1_lvm SINGLE TOKEN: START OF BACKUP Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-set IS backup.set1_lvm Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-date IS 20100816000003 Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-type IS regular Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-date-epoch IS 1281942003 Mon Aug 16 00:33:15 2010 BACKUP SET = set2_lvm_lvm PAIR: last-backup IS /home/backups/backup.set2_lvm_lvm/2010081420 +0003 .... and so forth .... =cut
    your data as a __DATA__ segment is here:

      @Marshall- thank you very much for your insight. What turned out initially as a way to summarize this data turned into a bigger project than I anticipated, but felt it was a good assignment for me to learn perl.

      You definitely nailed what I needed which was keying off the backup set and extracting some attributes associated with it.

      My goal of this output was to produce a delimited file as you can see in my print statements. Having to prefix the attributes with a numbering system seemed to help me with sorting it. The file is needed as input to an html table. Anyways, i'll review your pointers and code. Thanks again.

        Wow! You've taken on a pretty difficult "first assignment"! And you've gotten a heck of a lot further than most could have done! There are some "quirks" about this that make some of the details difficult.

        I posted some more code for you. Take a look and see "what is missing/not right".

        Update: I see why you did: my $BckupKey="5-Duration"; Don't do this "5-" "decoration" of the hash key. There are better albeit advanced techniques for specifying the sort order. Concentrate on getting what data you need and then you can get help here about how to get it appear in the "right" order.

        Below is just one example of a special sort order. A more robust thing would take into account what happens when I haven't specified the order of some input string vs another. I am just saying that advanced sorting is one of the things that Perl is very good at.

        #!/usr/bin/perl -w use strict; my @special_order = ("x", "b", "a", "y"); my $i =0; my %sort_order = map{$_ => $i++}@special_order; my @array = ("a", "x", "y", "b"); @array = sort @array; print "Regular Sort: @array\n"; @array = ("a", "x", "y", "b"); @array = sort by_order @array; print "Special Sort: @array\n"; sub by_order { my $a_order = $sort_order{$a}; my $b_order = $sort_order{$b}; $a_order <=> $b_order } __END__ prints: Regular Sort: a b x y Special Sort: x b a y

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://855951]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-03-28 15:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found