Optimizing many many regex matches

BUU has asked for the wisdom of the Perl Monks concerning the following question:

I've recently come upon a mildly interesting problem without much immediate relevance, so I've been considering it but I haven't gotten around to solving it. The situation is fairly simple, I have a list of around 40 to 100+ objects. Each object contains a 'regexp' value and a 'code' value. The idea is to execute each 'code' value that has its corresponding regexp match. Now, the obvious and simple way I've been using looks like this:

my $string;

for( @regexes )
{
  if( @matches = $string =~ $_->{regexp} )
  {
    $_->{code}->($string, @matches);
  }
}
[download]

But this strikes me as slightly inefficient in a number of ways. Surely the regexp engine could optimize this if only it knew about all the matches I was intending to perform. There are a number of modules on CPAN that advertising combining lots of regexp to do a simple boolean check, but I couldn't see an easy way to then match this back to the appropiate 'code' value that's also in the datastructure. Also, each regexp might have one or more capturing fields I need to keep track of and pass to the code ref. Any thoughts?

Comment on Optimizing many many regex matches Download Code

Replies are listed 'Best First'.
Re: Optimizing many many regex matches by grinder (Bishop) on Oct 22, 2006 at 20:16 UTC
BUU, what you want is Regexp::Assemble, and create a pattern in "tracked" mode. This will allow you to assemble a single pattern against which to test strings. Big efficiency gain there. You can then recover the original pattern that triggered the match, use that as a key into a dispatch table, passing in the captured fields. Have a look at `eg/ircwatcher` to get some ideas. A more advanced version appeared in Perl Hacks, if you happen to have a copy lying around. • another intruder with the mooring in the heart of the Perl	[reply]
Re: Optimizing many many regex matches by xdg (Monsignor) on Oct 22, 2006 at 22:25 UTC
This might be a situation where study might be helpful. Here's section from the docs: Takes extra time to study SCALAR ($_ if unspecified) in anticipation +of doing many pattern matches on the string before it is next modifie +d. This may or may not save time, depending on the nature and number +of patterns you are searching on, and on the distribution of characte +r frequencies in the string to be searched--you probably want to comp +are run times with and without it to see which runs faster. Those loo +ps that scan for many short constant strings (including the constant +parts of more complex patterns) will benefit most. [download] -xdg Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply] [d/l]

Back to Seekers of Perl Wisdom