virus log parser

phaedo has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: virus log parser
by Rhose (Priest) on Jul 02, 2002 at 20:44 UTC

The following code reads from __DATA__ and writes its (tab delimited) records to the screen; you would probably want to open your log file for processing (open(LF,"$logFile")), and write to a results file (open(OF,">$outputFile")).

#!/usr/bin/perl -w
use strict;

my $gCurRec;

foreach(qw(name to file action virus))
{
  $gCurRec->{$_}='';
}

while(<DATA>)
{
  $gCurRec->{name}=$1   if (/^From:\s*(.+?)\s*$/);
  $gCurRec->{to}=$1     if (/^To:\s*(.+?)\s*$/);
  $gCurRec->{file}=$1   if (/^File:\s*(.+?)\s*$/);
  $gCurRec->{action}=$1 if (/^Action:\s*(.+?)\s*$/);
  $gCurRec->{virus}=$1  if (/^Virus:\s*(.+?)\s*$/);

  if (/^-----/)
  {
    print $gCurRec->{name},"\t",
          $gCurRec->{to},"\t",
          $gCurRec->{file},"\t",
          $gCurRec->{action},"\t",
          $gCurRec->{virus},"\n";

    foreach(qw(name to file action virus))
    {
      $gCurRec->{$_}='';
    }

  }
}


__DATA__
From: pminich@foo.com
To: esquared@foofoo.com
File: value.scr
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
Date: 06/30/2002 00:01:21
From: mef@mememe.com
To: inet@microsoft.com
File: Nr.pif
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
[download]

Comment: One other thing I found I like is opening files with three parameters. For example, instead of:

open(OF,">$outputFile") || die;

I use:

open(OF,'>',$outputFile) || die;

I hope this helps! *Smiles*

Update:

Now that I have re-read my code, I should have made

qw(name to file action virus)

a constant so it was defined but one place, and should have made the field separator a constant as well. This would simplify changes to the code. (Not that it is critical on such a small program, but it is a good practice... well, for me at least.)

[reply]
[d/l]

Re: Re: virus log parser

by rincew (Novice) on Jul 02, 2002 at 21:56 UTC

First of all, the construct

    foreach(qw(name to file action virus))
    {
      $gCurRec->{$_}='';
    }
[download]

hash slices


    my @columns = qw(name to file action virus);
    @{ $gCurRec }{ @columns } = ('') x @columns;
[download]

this

Furthermore, why do you use a hash reference to store the data when a hash would be sufficient? (This is probably a matter of style.)

Then, I usually consider multiple repeated lines with trivial differences like

    $gCurRec->{name}=$1   if (/^From:\s*(.+?)\s*$/);
    $gCurRec->{to}=$1     if (/^To:\s*(.+?)\s*$/);
[download]


    /^(\w+):\s*(.+?)\s*$/ and $gCurRec->{$1} = $2;
[download]

So finally here is my attempt at implementing your algorithm:

#!/usr/bin/perl -w
use strict;

my %gCurRec = ();

while(<DATA>) {
  /^-+\s*$/ and do {
    print join("\t",
            map { exists $gCurRec{$_} ? $gCurRec{$_} : '' }
            qw(from to file action virus)
          ) . "\n";
    %gCurRec = ();
    next;
  };

  /^(\w+):\s*(.+?)\s*$/ and $gCurRec{lc $1} = $2;
}


__DATA__
From: pminich@foo.com
To: esquared@foofoo.com
File: value.scr
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
Date: 06/30/2002 00:01:21
From: mef@mememe.com
To: inet@microsoft.com
File: Nr.pif
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
[download]

[reply]
[d/l]
[select]

Re: Re: Re: virus log parser

by crazyinsomniac (Prior) on Jul 03, 2002 at 08:56 UTC

First of all, the construct foreach(qw(name to file action virus)) { $gCurRec->{$_}=''; } can be expressed very succinctly using so called hash slices, i.e. my @columns = qw(name to file action virus); @{ $gCurRec }{ @columns } = ('') x @columns;
[download]


$gCurRec = {
    map { ( $_ => "" ) }
    qw{ from to file action virus }
};
[download]

______crazyinsomniac_____________________________
Of all the things I've lost, I miss my mind the most.
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

[reply]
[d/l]
[select]

Re: virus log parser
by joealba (Hermit) on Jul 03, 2002 at 04:17 UTC

TheDamian

use strict;
use Parse::RecDescent;
use Data::Dumper;

my $grammar = q{
    viruslog: message(s)
        { %{$return} = map {@{$_}} (@{$item[1]}); }
        
    message: /^(\w+):\s+ (.*)/x
        {    $return = [lc($1), $2]; }
};
my $parser = new Parse::RecDescent $grammar or die "Invalid grammar";


foreach (split /---+/, join '', <DATA>) {
    my $record = $parser->viruslog($_);
    print Dumper($record) if defined $record;
}


__DATA__
From: pminich@foo.com
To: esquared@foofoo.com
File: value.scr
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
Date: 06/30/2002 00:01:21
From: mef@mememe.com
To: inet@microsoft.com
File: Nr.pif
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
[download]

$VAR1 = {
          'file' => 'value.scr',
          'virus' => 'WORM_KLEZ.H',
          'to' => 'esquared@foofoo.com',
          'from' => 'pminich@foo.com',
          'action' => 'The uncleanable file is deleted.'
        };
$VAR1 = {
          'date' => '06/30/2002 00:01:21',
          'file' => 'Nr.pif',
          'virus' => 'WORM_KLEZ.H',
          'to' => 'inet@microsoft.com',
          'from' => 'mef@mememe.com',
          'action' => 'The uncleanable file is deleted.'
        };
[download]

[reply]
[d/l]
[select]

Re: virus log parser
by jjohn (Beadle) on Jul 03, 2002 at 02:19 UTC

I have an anti-virus log file that I would like to eventually put into a mysql database; but I'm having problems parsing it.

This is exactly the class of problems for which Perl was designed. There are many ways to approach this problem, as has already been shown. I'd like to submit my quick and dirty version here. It reads through the log file (really anything on STDIN) and creates an array of hash references, suitable for sorting or iterating through to collect stats like most common email address or virus. This version is short and hopefully transparent.

#!/usr/bin/perl
use strict;
use Data::Dumper;

my (@log, %rec);
while(<>){

  if( /^-/ ){
    push @log, { %rec };
    %rec = ();
    next;
  }

  chomp;
  my ($k, $v) = split /\s*:\s*/, $_, 2;
  $rec{ $k } = $v if $k;
}

push @log, { %rec } if keys %rec;

print Dumper(\@log);
[download]

While slurping input, each line is checked to see if it is an "end of record" marker, which is defined here as any line beginning with a dash. If this doesn't match your reality, you will need to tinker with this line. When the end of record is found, the hash that represents that record is stuffed into the @log array. Since arrays can only hold scalar values, a hash reference is needed. Unfortunately, we can't simply use the reference made like this: \%rec because that hash will be erased on the next line! Instead, we create a brand new anonymous hash with { } and stuff that away. We clear out the "global" hash and grab the next line of input.

If the line of input isn't an end of record line, then the newline is removed and the very potent split operate is used to separate the key from the value. This assumes the the key and value are on the same line, of course. As a defensive measure, ancillary whitespace will be consumed around the colon. The often neglicted third argument to split indicates how many fields split should produces. Even if a colon appears somewhat in the value field, it will still appear as part of the $v variable. After creating a key and a value variable ($k and $v), the record hash is populated with these values provided the key is a true value. This prevents silly things like blank or malformed lines from disturbing your hash.

When the loop exits, you might not have pushed the last hash into the @log array (e.g. the last record separator might have gone missing on you). Therefore, a check is made to see if %rec has any keys which would cause that record to be dumped into @log.

I use Data::Dumper merely to show that @log has been populated correctly. If you aren't familiar with Data::Dumper, do make yourself acquainted. It can be a real lifesaver.

I leave the writing of the analysis of @log as an exercise for the reader. If references and dereferences make your head spin, take a look at Mark-Jason Dominus's Understand References Today.

Hope this helps.

[reply]
[d/l]

Re: virus log parser
by thpfft (Chaplain) on Jul 03, 2002 at 03:14 UTC

I'm not sure what you were planning with the matrices: if you want to work further with this data, or move it into a database, you're probably best off pulling it into a hash, or an array-of-hashes.

If the file is very large, or memory is limited, you may have to read the file line by line, as others have suggested, insert each completed record into the database and then use that to perform whatever analyses made you want to put them there in the first place.

If you're more interested in a quick scan - how much klez this week? - then a AoH will be more fun. You should probably still use a cursor to read the file, though. it might be more dashing to do an enormous split on -+, but not wise. especially if you reset $/ to do it. really wouldn't do that. a little too sweeping.

If there was a unique identifier with each record, then a HoH would be more useful: a big hash in which the keys come from your unique field and each value is another hash containing the foo=bar pairs you've extracted. The main advantage would be that you share a key with the original file, allowing (for example) incremental updates of the database.

but there doesn't seem to be a useful hook like that, unless perhaps the events are rare enough that you don't mind assuming the timestamp for each entry is unique. So everything would go in an array instead, and the array index could serve as a makeshift id. you could still use the dates to act on only part of the file, or just invoke your script from logrotate.

I'll assume that you're putting everything in a database first and then working with it later. this is pretty hasty, but tested and i've tried to keep it readable:

#!/usr/bin/perl

use strict;
use DBI;
use Data::Dumper;

# decide which bits of the records you want to keep

my @fields_to_store = qw(date name to file action virus);

# turn that into a hash with which to screen regex matches

my %field_ok = map { $_ => 1 } @fields_to_store;

# and two strings for the database insert statement: one of column 
# names, one with the proper number of placeholders.

my $field_list = join(',', @fields_to_store);
my $placeholders = join(',', ('?' x scalar(@fields_to_store)));

# connect to the database 

my $dsn = "DBI:mysql:database=xxxx;host=localhost";
my $dbh = DBI->connect($dsn, 'xxxx', 'xxxx', {
    'RaiseError' => 1
});

# build the instruction that will be used to insert each record 

my $insert_handle = $dbh->prepare("insert into xxxx ($field_list) valu
+es ($placeholders)");

# read the file. this %gather basket is crude, but effective 
# enough, so i offer it in the spirit of tmtowtdi

my %gather;
while(<DATA>) {

 # match data line?

    if (m/^(\w+):\s*(.+?)\s*$/ && $field_ok{lc $1}) {
        die "overwriting $1 field: broken" if exists $gather{lc $1};
        $gather{lc $1} = $2;
    }

# match dividing line?

    if (m/^-+\s*$/ && keys %gather) {

  # field order matters, of course, so use the fields_to_store array  
  # in a map{}  to order the contents of %gather, which would 
  # otherwise be jumbled
    
        $insert_handle->execute( map { $gather{lc $_} } @fields_to_sto
+re );
        print Dumper \%gather; 
        %gather = ();
    }
}

$insert_handle->finish;

__DATA__
----------------------------------
Date: 06/30/2002 00:01:21
From: pminich@foo.com
To: esquared@foofoo.com
File: value.scr
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
Date: 06/30/2002 00:01:21
From: mef@mememe.com
To: inet@microsoft.com
File: Nr.pif
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
[download]

For your database to be of much use you'd really need to split the email field and store that in a separate table, with another table in between that and the main one to hold the links between log entries and addresses. By that stage it would already be worth looking for something like Class::DBI to do the drudgery for you.

[reply]
[d/l]

Re: virus log parser
by flocto (Pilgrim) on Jul 02, 2002 at 21:40 UTC

I would not try using the input seperator for doing things like this.. It's a lot easier to parse this line by line..:

# open file, DB-connection, etc..

my %data;
while (my $line = <INPUT>)
{
  chomp ($line);
  if ($line =~ m/^-+$/)
  {
    &save (%data);
    %data = ();
  }
  elsif ($line =~ m/^(\w+):\s(.+)$/)
  {
    $data{lc($1)} = $2;
  }
  elsif ($debug)
  {
    warn $line;
  }
}

sub save
{
  # up to you :)
}
[download]

You should note that the regex is not optimal. If it was as easy to read as the above I would have written m/([^:]+):\s/ with $1 and $'..Dig intp perlre when interested. Another thing to note it that you should make sure that all the keys of that hash you want to save to a database has all well defined values! Oh, last but not least: The only reason I wrote &save was to demonstrate that it is not a build-in function..

Regards,
-octo-

[reply]
[d/l]
[select]

Re: virus log parser
by yodabjorn (Monk) on Jul 03, 2002 at 02:04 UTC

#!/usr/bin/perl 

use strict ;   
use warnings ;
use Data::Dumper ; 

my @records ;
my $count = 0 ;
while (<DATA>)
{
    next if ( /^\n/ ) ; # skip newlines
    if (/^--/)  # new record
    {
        $count++ ;   
        next ;
    }

    my ( $field, @words ) = split ;  # get the 2 needed fields
    $field =~ s/://g ;                          # drop the ":"
    my $data = join " ", @words ;  # make a string
    chomp $data ;                            #  remove the newline
    
    $records[$count]{$field} = $data ;
}

print Dumper(\@records); # easy way to unfold the structure 

__DATA__
From: pminich@foo.com
To: esquared@foofoo.com
File: value.scr
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
Date: 06/30/2002 00:01:21
From: mef@mememe.com
To: inet@microsoft.com
File: Nr.pif
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
[download]

foreach my $record (@records)
{
    # $record is now a ref to the Hash it contianed
    # print a field from the record
     print "\nNew Record\n" ;
     print "FROM: $$record{From}, \n" ;


    # or loop through each key for current record
    foreach my $key (sort keys %$record )
    {
        print "$key => $$record{$key} \n" ;
    }
}
[download]

[reply]
[d/l]
[select]


Don't ask to ask, just ask
	PerlMonks