Re: Bulk Regex?

qr// is sweet. Big patterns are slower than many small patterns.
UNTESTED CODE:

my @codes = qw/ CODE1 CODE2 CODE3 /;
my @regex = map { qr/$_/i } @codes;

while (my $inputline = <FILE>) {
    my $found = 0;
    foreach my $regex (@regex) {
        last if ($found) = $inputline =~ /($regex)/);
    }
    next unless $found;
    
    # logic goes here
}
[download]

Make sure you put the most common codes at the front of your @codes array for more speed, since you'll do less searching.

--
perl -e "print qq/just another perl hacker who doesn't grok japh\n/"
simeon2000|http://holdren.net/

Comment on Re: Bulk Regex? Select or Download Code

Replies are listed 'Best First'.

Re: Re: Bulk Regex?
by RMGir (Prior) on Aug 30, 2002 at 12:40 UTC

Make sure you put the most common codes at the front of your @codes array for more speed, since you'll do less searching.

Interesting idea... I wonder if dynamically resorting as you go would help?

my @codes = qw/ CODE1 CODE2 CODE3 /;
my %hitCounts;
my @regex = map { $hitCounts{$_}=1; qr/$_/i } @codes;

# tune this parameter for optimal performance, balancing better orderi
+ng
# of regexen with sort costs...
my $resortFreq=1000;

my $iterCount=0;

while (my $inputline = <FILE>) {
    my $found = 0;
    foreach my $regex(@regex) {
        if ($inputline =~ /$regex/) {
            $hitCounts{$regex}++;
            $found = 1;
            last;
        }
    }
    # re-sort every 1000 lines.  The "1000" is a parameter that prolly
+ should
    # be tuned
    if(++$iterCount%$resortFreq == 0) {
        @regex=sort {$hitCounts{$b}<=>$hitCounts{$a}} @regex;
    }
    next unless $found;
    # logic goes here
}
[download]

END {
   print "Regexen in sorted order:\n\t";
   print join "\n\t",sort {$hitCounts{$b}<=>$hitCounts{$a}} @regex;
   print "\n";
}
[download]

[reply]
[d/l]
[select]


go ahead... be a heretic
	PerlMonks