I am attempting to parse log files as efficiently as possible in Perl. In the following code snippet, I need to grab the first 18 fields, the next 40 characters, another 40 characters, and then the remaining fields in the string. The fields can be variable as you can see in the test data string.
Is there a faster way to do this in Perl? Is there a better regular expression to grab the first 18 fields?
Without a loss of speed, can I create a class that blesses the regex and has methods for returning the elements of the log file line? What is the fastest way to process log files without using, for instance, inline C?
Any assistance will be greatly appreciated. Thanks.
#!/usr/local/bin/perl -w
use strict;
my $testdata=<<TESTDATA;
-3 1 2 3 4 5 6657 7 8 9 10 11 12 13 14 15 16 20021013000000 NM 1 :
+ SR9550/1-SR9551/1 16S 1 12 LINE WEST
+ 0 0
-3 2 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013000000 Test021011
+ 0
+ 0
-3 3 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013000000 Test021011a
+ 0
+ 0
-3 4 67 0 9 6 6657 2 1 0 0 0 0 6 131 0 0 20021013000000 NM 1 :
+ SR9550/1-SR9551/1 16S 1 18 LINE EAST 0
+ 0
-3 5 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013001500 Test021011
+ 0
+ 0
-3 6 67 0 9 2 6657 2 1 0 0 0 0 6 131 0 0 20021013001500 NM 1 :
+ SR9550/1-SR9551/1 16S 1 12 LINE WEST 0
+ 0
-3 7 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013001500 Test021011a
+ 0
+ 0
-3 8 67 0 9 6 6657 2 1 0 0 0 0 6 131 0 0 20021013001500 NM 1 :
+ SR9550/1-SR9551/1 16S 1 18 LINE EAST 0
+ 0
-3 9 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013003000 Test021011
+ 0
+ 0
-3 10 67 0 9 2 6657 2 1 0 0 0 0 6 131 0 0 20021013003000 NM 1 :
+ SR9550/1-SR9551/1 16S 1 12 LINE WEST
+0 0
-3 11 67 0 0 2 6657 2 1 0 0 0 0 4 131 0 0 20021013003000 Test021011a
+
+0 0
-3 12 67 0 9 6 6657 2 1 0 0 0 0 6 131 0 0 20021013003000 NM 1 :
+ SR9550/1-SR9551/1 16S 1 18 LINE EAST
+0 0
TESTDATA
my @data;
@data = split( '\n', $testdata );
my $line;
my $str_18_fields;
my $str_40_chars1;
my $str_40_chars2;
my $str_remain;
my $regex = qr/^-((\S+\s+){18})(.{40})(.{40})(.+)/;
foreach $line (@data) {
if ($line =~ /^-3/) {
$line =~ m/$regex/;
$str_18_fields = $1;
$str_40_chars1 = $3;
$str_40_chars2 = $4;
$str_remain = $5;
$str_40_chars1 =~ s!\|!_!;
$str_40_chars2 =~ s!\|!_!;
print "\$str_18_fields = $str_18_fields\n";
print "\$str_40_chars1 = $str_40_chars1\n";
print "\$str_40_chars2 = $str_40_chars2\n";
print "\$str_remain = $str_remain\n\n";
}
} # end foreach