Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Efficiency: Foreach loop and multiple regexs

by Dinosaur (Beadle)
on Sep 13, 2002 at 18:40 UTC ( [id://197691]=note: print w/replies, xml ) Need Help??


in reply to Efficiency: Foreach loop and multiple regexs

Just to clarify before I start: What you want is to find the subset of @array elements which match any regex -- right?

Regex::PreSuf suggested above looks interesting, if your regexes really are just a list of words to match.

Otherwise, no matter which way you nest the loops, you can expect to have to do (140*20000)/2 regex matches on the average (/2 because you get out on the first match). You can optimise the loop (e.g., with study suggested above), but the payoff has to be in running it a lot less times.

To do that, you might think about ordering the regex list. (Here I'm assuming that the regex list is constant and the data varies from run to run). If you know something about the structure of the incoming data, you should be able to guess with some accuracy which regexes are most likely to match the most data entries. Put those at the front of the list. Depending on how much trouble you're willing to go to over this, you might even conduct some experiments on the data to find the best regexes.

Also consider ranking the regexes fastest-first, a determination you can probably get pretty close to by eyeball.

--Dinosaur

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://197691]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-25 14:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found