The appropriate solution to this problem depends on how precise the pattern matching needs to be. How much post-extraction processing you are willing to do matters as well, e.g. do you need '58bn5904' or are you content with 'd:\data\58bn5904.dat'.
To give you an idea of how ugly the regex could become:
use strict;
use warnings;
# Example line:
# e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) S
+ent file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 byte
+s)
# Desired:
# beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes
my $re_date = qr<
(?:Sun|Mon|Tue|Wed|Thu|Fri|Sat)
\s
\d{1,2} # Day of month
(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) # Month
\d{2} # Two digit year
\s
\d{2}:\d{2}:\d{2}
>x;
my $pattern = qr<
e:\\logfiles\\(.*?) # Capture(1) filename
\s
\[\d+\] # Bracketed number
\s
($re_date) # Capture(2) date
\s - \s
\(\d+\) # number in parens
\s
Sent \s file \s
d:\\data\\(.*?)\.dat # Capture(3) file basename
\s
successfully
\s
\(
[0-9.]+
\s
[A-Z]b
/sec
[ ] - [ ]
(\d+ \s bytes) # Capture(4) bytes text
\)
>x;
while (my $line = <DATA>) {
if ($line =~ /$pattern/) {
my ($logfile, $date, $file_basename, $bytes) = ($1,$2,$3,$4);
printf "(%s) (%s) (%s) (%s)\n", $logfile,$date,$file_basename,
+ $bytes;
}
}
__DATA__
e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen
+t file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 bytes)
I have been meaning to learn
Parse::RecDescent for ages, so tonight I took some time to try and solve your problem with it. It is likely the wrong tool for this job, and definitely a poor implementation - I would welcome any feedback for people with stronger parse-fu.
use strict;
use warnings;
use Parse::RecDescent;
$::RD_HINT=5;
my $grammar = <<'GRAMMAR';
{
use strict;
use warnings;
}
logfile : 'e:\\logfiles\\' /[-A-Za-z0-9_.]+/
{ $item[2] }
date :
m{
(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun)
\s
\d\d
(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
\d\d
}x
time : /\d{2}:\d{2}:\d{2}/
sentfile:
<skip:''>
'd:\\data\\'
/[-A-Za-z0-9_]+/
'.dat'
{ $item[3] }
rate : /\d+\.\d [A-Za-z]+\/sec/
bytecount : /\d+ bytes/
parse :
logfile
/\[\d+\]/
date
time
/- \(\d+\) Sent file /
sentfile
<skip:'[- \t()]*'> (
/successfully/
rate
)
bytecount
{ [ @item{qw(logfile date time sentfile bytecount)}] }
GRAMMAR
# Expect: beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes
my $parser = Parse::RecDescent->new($grammar);
use Data::Dumper;
while (my $line = <DATA>) {
last unless $line =~ /\S/;
my @fields = $parser->parse($line);
if (@fields) {
print Dumper \@fields;
}
}
__DATA__
e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen
+t file d:\data\58bn5904.dat successfully (25.0 Kb/se
Output:
$VAR1 = [
[
'beardstownbase.log',
'Thu 22Jun06',
'08:07:19',
'58bn5904',
'859216 bytes'
]
];