Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

How do I read a log file that contents recurring log messages those are separated by newline characters?

by WantToBeJediInPerl (Initiate)
on Oct 14, 2010 at 19:06 UTC ( [id://865335]=perlquestion: print w/replies, xml ) Need Help??

WantToBeJediInPerl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've been trying to think of a way to do this for a while and I'm stuck. I do coding in C but pretty new in Perl.

This is want I am trying to achieve :

I have a log file that contain log messages and separated by newline characters. Need to write a Perl program that finds the top 8 most reappeared log messages? Please note that the log file might be too big to fit in the memory at one time.

Think about that the log file has a format similar to Linux syslog format as follows:

Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol unlock_ +page Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol generic +_file_read Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol generic +_file_write Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol generic +_file_mmap Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol generic +_file_sendfile Mar 9 08:15:05 gen-vcs11 kernel: kjslah: disagrees about versio +n of symbol zone_table Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol zone_ta +ble Mar 9 08:15:05 gen-vcs11 kernel: kjslahdisagrees about version +of symbol unlock_page Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol unlock_page Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol filemap +_fdatawrite Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol find_or +_create_page
so on... Any help would be greatly appreciated! Thanks!
  • Comment on How do I read a log file that contents recurring log messages those are separated by newline characters?
  • Download Code

Replies are listed 'Best First'.
Re: How do I read a log file that contents recurring log messages those are separated by newline characters?
by toolic (Bishop) on Oct 14, 2010 at 19:22 UTC
    Store your messages as keys in a hash, and increment the count.
    use strict; use warnings; my %msgs; while (<DATA>) { s/^\s+//; chomp; $msgs{$_}++; } # Sort by number of occurrences and only show top 8: my $i = 0; for my $m (sort {$msgs{$b} <=> $msgs{$a}} keys %msgs) { print "$msgs{$m} $m\n"; $i++; last if $i == 8; } __DATA__ Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol unlock_ +page Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol generic +_file_read Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol generic +_file_write Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol generic +_file_mmap Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol generic +_file_sendfile Mar 9 08:15:05 gen-vcs11 kernel: kjslah: disagrees about versio +n of symbol zone_table Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol zone_ta +ble Mar 9 08:15:05 gen-vcs11 kernel: kjslahdisagrees about version +of symbol unlock_page Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol unlock_page Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol filemap +_fdatawrite Mar 9 08:15:05 gen-vcs11 kernel: kjslah: Unknown symbol find_or +_create_page

    See also:

    perlintro

    perldoc -q sort

      You are including the time stamps in the key, so identical events a second apart increment separate counts. or maybe you were showing the overall concept, and leaving the trimming as an exercise for the student?

      It looks like gen-vcs11 kernel is a standard component of every line, so I would ignore it. Using split to extract the second and third components, and using those as a key:

      my ( $code, $msg ) = ( split, ':', $_)[2,3]; $msgs{$code}{$msg}++;

      It becomes even simplar if you only want to preserve the msg component.

      As Occam said: Entia non sunt multiplicanda praeter necessitatem.

        or maybe you were showing the overall concept, and leaving the trimming as an exercise for the student?
        Yes.
Re: How do I read a log file that contents recurring log messages those are separated by newline characters?
by pileofrogs (Priest) on Oct 14, 2010 at 19:16 UTC

    This is what perl is great for.

    Probably the most direct approach would be to parse each line with a regex to get the parts you care about (IE not the timestamp). Create a hash where the key is the relevant part of the line and the value is a number that you increment every time you find the same message. Notice you're not keeping the whole file in memory, just one instance of each line and a number.

    --Pileofrogs

Re: How do I read a log file that contents recurring log messages those are separated by newline characters?
by wwe (Friar) on Oct 15, 2010 at 14:25 UTC
    maybe you want visit this excellent site: (for filtered view: http://www.loganalysis.org/) which discusses general strategies and also holds some (perl) programs for log analysis.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://865335]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-16 14:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found