Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??

I cant remember the name of the CPAN version, but its possible to construct a relatively optimized Regex for matching multiple strings by constructing a Patricia Trie. (There are various discussions of this technique on PM.) Then the issue becomes simply

my %replace_hash=(foo=>'bar',baz=>'fnord'); my $regex=compile_regex(keys %replace_hash); s/\b($regex)\b/$replace_hash{$1}/g;

Its actually not difficult to construct the optimized regex, but the result scales poorly. Once you have more than a few dozen words involved the time take in backtracking etc (with or without look forward assertions) becomes signifigant. In that case Ive found that its actually faster to use the Patricia tree directly and not bother with the regex. This would not be true however if we had a choice of a DFA regex or an NFA regex. The Patricia Trie essentially repesents (most of) a DFA state transition table and as such it needs minimal backtracking. In fact it never backtracks over the initial character, advancing one character every match failure, and with further optimization it need not backtrack at all. (DFA's never backtrack, hence the term "deterministic")

update: I wrote a node explaining Patricia Tries here: Re:x2 A Regexp Assembler/Compiler (Whats a 'trie'?)


<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

In reply to Re: 3 Examples of Multiple-Word Search n Replace by demerphq
in thread 3 Examples of Multiple-Word Search n Replace by chunlou

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.