Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

IMO the only reason R::L beats R::A is because of the char class assertion that is used in R::L. I suspect that you would find that R::A performs equally well once you do this. The other thing is that R::A will change the order that the words are matched, and R::L wont. Im not sure if this is intentional or not as I can see it being a nice idea to sort the strings by order of frequence of their leading char, at least when matching some kind of "normal" text.

As for my trie work and the upcoming Aho-Corasick patch, you are right it doesnt deal with metacharacters and it probably never will. This is why modules like yours will always have a place on CPAN, there are too many types of pattern manipulation that can occur during optimisation to put them all in perl. The cost of such optimisations are shared across every regex compiled or executed so having rarely useful optimisations built in doesnt make sense, the few cases that get improved are outweighed by the cases that are slowed down by the extra optimisation logic. Modules like yours however only come into play at the users request and as such can be far more "aggressive" in what they do.

Having said that, I really hope you take the time to educate R::A about the new trie optimisation so that when it can take advantage of the optimisation it does. A good example would be converting a list of the following patterns: ('foo','ba[rz]') into /foo|bar|baz/ which will match much faster.

---
demerphq


In reply to Re^3: build regexp on a list of patterns by demerphq
in thread build regexp on a list of patterns by mod_alex

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (2)
As of 2024-04-19 18:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found