Re: Efficiency: Foreach loop and multiple regexs

Just to clarify before I start: What you want is to find the subset of @array elements which match any regex -- right?

Regex::PreSuf suggested above looks interesting, if your regexes really are just a list of words to match.

Otherwise, no matter which way you nest the loops, you can expect to have to do (140*20000)/2 regex matches on the average (/2 because you get out on the first match). You can optimise the loop (e.g., with study suggested above), but the payoff has to be in running it a lot less times.

To do that, you might think about ordering the regex list. (Here I'm assuming that the regex list is constant and the data varies from run to run). If you know something about the structure of the incoming data, you should be able to guess with some accuracy which regexes are most likely to match the most data entries. Put those at the front of the list. Depending on how much trouble you're willing to go to over this, you might even conduct some experiments on the data to find the best regexes.

Also consider ranking the regexes fastest-first, a determination you can probably get pretty close to by eyeball.

--Dinosaur

Comment on Re: Efficiency: Foreach loop and multiple regexs Download Code


Think about Loose Coupling
	PerlMonks