I would like to add some random thoughts I had when I saw your code.
First of all, the construct
foreach(qw(name to file action virus))
{
$gCurRec->{$_}='';
}
can be expressed very succinctly using so called
hash slices, i.e.
my @columns = qw(name to file action virus);
@{ $gCurRec }{ @columns } = ('') x @columns;
See for example
this for a good introduction.
Furthermore, why do you use a hash reference to store the data when a hash would be sufficient? (This is probably a matter of style.)
Then, I usually consider multiple repeated lines with trivial differences like
$gCurRec->{name}=$1 if (/^From:\s*(.+?)\s*$/);
$gCurRec->{to}=$1 if (/^To:\s*(.+?)\s*$/);
to be a sign that some kind of abstraction like a loop is needed. In this case, keying each datum by its header field
/^(\w+):\s*(.+?)\s*$/ and $gCurRec->{$1} = $2;
does so and furthermore removes the need to spell out the interesting header fields several times. This of course means that unknown fields like the Date: are ignored, but your code ignores them as well.
So finally here is my attempt at implementing your algorithm:
#!/usr/bin/perl -w
use strict;
my %gCurRec = ();
while(<DATA>) {
/^-+\s*$/ and do {
print join("\t",
map { exists $gCurRec{$_} ? $gCurRec{$_} : '' }
qw(from to file action virus)
) . "\n";
%gCurRec = ();
next;
};
/^(\w+):\s*(.+?)\s*$/ and $gCurRec{lc $1} = $2;
}
__DATA__
From: pminich@foo.com
To: esquared@foofoo.com
File: value.scr
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
Date: 06/30/2002 00:01:21
From: mef@mememe.com
To: inet@microsoft.com
File: Nr.pif
Action: The uncleanable file is deleted.
Virus: WORM_KLEZ.H
----------------------------------
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.