Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I wonder on the use of the availability of so many similar but not quite identical modules

My rationale for writing Regexp::Assemble was that when I started looking around for something to do what I needed to be done, I didn't find what I wanted. Regex::PreSuf does not recognise metacharacters, so a\dz and a\sb produces something odd, like a\(?:dz|sb), which is uncompileable.

Regexp::Optimizer came closer to what I wanted, in that it understands metacharacters, but it doesn't do what I call tail folding. I have a large list of regular expressions that tend to share common tails and I was interested in producing the shortest expression possible (even though it can result in a more complex pattern that may actually perform a bit more slowly than if the tails were left alone).

For instance, on a sample of 4000 dictionary words weighing about 46Kb, the resulting R::A pattern is 23Kb long, whereas the R::O pattern is 28Kb long. For me the gain of 5Kb was an important consideration.

Another important issue is that while I consider Regexp::Assemble to be slow, it is no slouch. It assembles 3000 complex patterns into one in about 1.2 seconds. I fed the same set of patterns to Regexp::Optimizer, and 850 cpu seconds later it is still grinding away. (update: still chugging away, now at 11500 seconds. I am beginning to wonder whether it will terminate. 5Mb of core, so it's not swapping... later: 11 CPU hours later, Regexp::Optimize is still running. One may reasonably consider that it doesn't work on large patterns. There must be something exponential happening).

I look on in interest at demerphq's work in implementing tries in the regular expression motor, but as it doesn't deal with metacharacters at the moment, it's usefulness is somewhat limited for what I'm doing.

- another intruder with the mooring in the heart of the Perl


In reply to Re^2: build regexp on a list of patterns by grinder
in thread build regexp on a list of patterns by mod_alex

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-24 01:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found