http://qs321.pair.com?node_id=269841


in reply to 3 Examples of Multiple-Word Search n Replace

I cant remember the name of the CPAN version, but its possible to construct a relatively optimized Regex for matching multiple strings by constructing a Patricia Trie. (There are various discussions of this technique on PM.) Then the issue becomes simply

my %replace_hash=(foo=>'bar',baz=>'fnord'); my $regex=compile_regex(keys %replace_hash); s/\b($regex)\b/$replace_hash{$1}/g;

Its actually not difficult to construct the optimized regex, but the result scales poorly. Once you have more than a few dozen words involved the time take in backtracking etc (with or without look forward assertions) becomes signifigant. In that case Ive found that its actually faster to use the Patricia tree directly and not bother with the regex. This would not be true however if we had a choice of a DFA regex or an NFA regex. The Patricia Trie essentially repesents (most of) a DFA state transition table and as such it needs minimal backtracking. In fact it never backtracks over the initial character, advancing one character every match failure, and with further optimization it need not backtrack at all. (DFA's never backtrack, hence the term "deterministic")

update: I wrote a node explaining Patricia Tries here: Re:x2 A Regexp Assembler/Compiler (Whats a 'trie'?)


---
demerphq

<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...