There's more than one way to do things | |
PerlMonks |
(tye)Re: Help on using alternation grouping star versus dot star.by tye (Sage) |
on Jun 10, 2001 at 11:59 UTC ( [id://87278]=note: print w/replies, xml ) | Need Help?? |
(3) will speed up if you replace "(?:.|\n)" with something simpler such as "[\s\S]", or best, add the "s" option to your regex and just use ".". (1) is slower than (2) because (1) has to dispatch a lot more regex opcodes. That is, (2) can just hang out in the "[^<]*" opcode while it gobbles quite a few characters while (1) has to leave "[^<]" for each character (leaving the alternation/parens to move to the "*" then come back in through the alternation/parens to get back to the "[^<]"). You are correct (to my understanding) about the disadvantage of (3). But (3) has an advantage in that it is simpler than (1) and (2). According to ZZamboni's benchmarks, (3)'s disadvantage is only slightly greater than its advantage and slight compared to (1)'s disadvantage. But I suspect this is all rather dependant on the input used in the benchmarks. In particular, the length of the text to be matched and the frequency of <span> tag pairs within it will affect the relative performance characteristics. As for other alternatives, I see no reason to not simplify things greatly to: and not force the regex engine to match the intervening text at all. Finally, many regex libraries are built around deterministic finite state automata, which means that different regexes run at much closer to the same speed (though more complex ones take longer to compile), but memory consumption can grow out of bounds. So Java's regex performance characteristics could be drastically different than Perl's. - tye (but my friends call me "Tye")
In Section
Seekers of Perl Wisdom
|
|