http://qs321.pair.com?node_id=442769


in reply to Regexp do's and don'ts

You should recommend people avoid constructs like:  [Jj][Aa][Vv][Aa] as they are quite inefficient and also can blow out various optimizations just by their presence. Its better to write that (?i:Java). Also up until 5.9.2 perl doesnt optimise alternations very well so its advisable to use modules like Regexp::List or the like to preprocess

/Lists|of|words/
. OTOH as of 5.9.2 perl _does_ optimize them so using things like Regexp::List will only slow down your patterns (im hopeful by 5.10 these modules will be updated to Do The Right Thing Regardless™).

In fact if at all possible after that version it is recommended that you use alternations instead of using quantifier, bracketing. Ie,

/(cars|cart|carry|car)/
will be more efficent that
/(car([st]|ry)?)/
as of 5.9.2, and in some circumstance massively more efficient.

I admit i wrote the optimization so im tooting my own horn here a bit. :-) But it is worth realizing that alternations in later perls can be signifigantly faster than other hypothetically equivelent patterns.

---
demerphq

Replies are listed 'Best First'.
Re^2: [Try-out] Regexp do's and don'ts
by muba (Priest) on Mar 28, 2005 at 11:24 UTC
    I am fully aware of the fact that m/[Jj][Aa][Vv][Aa]/ sucks like hell. I just needed a "complex" regex which had a clear goal, in order to demonstrate multi-lining regexes. But I'll add a note: don't try this at home :)




    "2b"||!"2b";$$_="the question"
    Besides that, my code is untested unless stated otherwise.
    One more: please review the article about regular expressions (do's and don'ts) I'm working on.