Re: line by line match on an array of strings

Is there a more elegant solution?

Substituting speed for elegance, there are many quick wins for this sort of thing. Precompile your regexen with qr//, so each iteration doesn't compile them anew. Break out of the loop after the first successful match, if appropriate (the List::MoreUtils any approach does this, I think). Optimize the regex list and its application: can you profitably combine them, add modifiers like \b or ^, etc? Keep track of regex hit counts and sort your regex list now and then to apply the most common matches first, if appropriate.

The biggest win typically comes from rethinking the problem, of course. Without really knowing what you're attempting, it looks as though you might be trying to do some comparatively simple token matches. Something like

while (<INPUT>) {
 if (/\b keyword \s+ (\w+)/ and exists $keywords{ $1 }) {
  # .. do something with the token in $1 
 }
}
[download]

might do the trick. We'd need to see specific examples to give more specific advice. -Mike

Comment on Re: line by line match on an array of strings Download Code

Replies are listed 'Best First'.

Re^2: line by line match on an array of strings
by WoodyWeaver (Monk) on Jan 09, 2008 at 21:38 UTC

> however with a few hundred thousand lines to seach, and an array of a few hundred it is far too slow.

It could be slow because it has to do a lot of work at each end step, which is where optimizing the regex helps.

I think it is slow because your looping is of order (a few hundred thousand) TIMES (a few hundred).

It would be much better if the looping is of order (a few hundred thousand) times a big constant. You might be able to get away with that by 'precompile your regexen' (wonderful phrase) -- or imho more likely if your line can be broken into a small number of tokens, just do a dispatch table on tokens broken out from the line.

--woody

[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks