note
lune
Basically there is nothing special in reading regexes from a file in contrast to using predefined ones.
<p>
It boils down to the question, how to represent the matches and the number of matches in an efficient way.
<p>
I created some files with simplified test input to concentrate on the problem:
regexes.txt
<code>
ID1>>^a
ID2>>h$
ID3>>b
ID4>>[a-z]{9,10}
ID5>>[ah]
</code>
lines.txt
<code>
id_A: abcdefg
id_B: bcdefgh
id_C: cdefghijk
</code>
Probably you will have to make changes to the "split"-Statements to match the format of your input.
<p>
I am storing the matches in a Hash that uses the regex-expressions as keys and array references of matches as values.
<code>
#!/usr/bin/perl -w
use strict;
use autodie;
open(my $regexefile, "<", "regexes.txt");
my @regexes = <$regexefile>;
chomp @regexes;
my %regexes = map { split(/>>/, $_) } @regexes;
my %matches;
open(my $inputfile, "<", "lines.txt");
while (<$inputfile>) {
while (my ($id, $regex) = each(%regexes))
{
my (undef, $line) = split(/ /, $_);
if ( $line =~ /$regex/) {
if (! defined($matches{$regex})) {
$matches{$regex} = [];
}
chomp $line;
push($matches{$regex}, $line);
}
}
}
while (my ($regex, $matches) = each(%matches)) {
if (!scalar @$matches) {
next;
}
print "$regex: No of matches " . scalar @$matches . "\n";
foreach my $match (@$matches) {
print "matched $match\n";
}
}
</code>
</p>
Update: added autodie; warnings are already active because of -w.
1063935
1063935