http://qs321.pair.com?node_id=597362


in reply to Perl regexp matching is slow??

The author is clearly an idiot :)

The Perl porters have long known about pathological expressions that will continue to run after the heat-death of the universe. This doesn't mean those patterns should be banished, but with a little care, and a little thought, Perl probably offers the informed programmer techniques for supplying hints to the regular expression engine to avoid such exponential blowups.

I also find it interesting that Perl allowed me to write a very small amount of code in order to explore the problem. And the solution here is to not use greedy matching.

Change 'aaa' =~ /a?a?a?aaa/ to /a??a??a??aaa/ and you have to push n out to about 100000 before you begin to notice any appreciable slowdown in the matching. Take the following for a spin:

use strict; use Time::HiRes 'time'; my $n = shift || 3; my $opt = shift || '?'; my $str = 'a' x $n; my $re = ("a$opt" x $n) . $str; my $begin = time; print $str =~ $re, ' ', time-$begin, "\n";

Try running perl rematcher 20 and then perl rematcher 20000 ??. I get a better than three orders of magnitude improvement. Not bad for five minutes thought. Seeing how the Thomson graph climbs slowly but steadily from the beginning, I would not be surprised if Perl continued to outperform it all the way out until things begin to swap.

update: after finishing the article, I retract the statement that the author is an idiot. He does know his stuff, however, I still don't agree with the conclusion. He says that DFAs are better because you don't run into grief with patterns like /(.*)(.*)(.*)(.*)(.*)/ since it doesn't go exponential. This is just silly. If anything, it encourages sloppy programming.

Backreferences allow one to solve a much larger range of problems that DFAs cannot. That's why NFAs are in such widespread use. And if it takes a little education and experience to wield NFAs with assurance, then so be it.

• another intruder with the mooring in the heart of the Perl

Replies are listed 'Best First'.
Re^2: Perl regexp matching is slow??
by Aristotle (Chancellor) on Feb 07, 2007 at 10:05 UTC

    Of note is that Perl in particular already has safeguards in place to prevent exponential catastrophes. At least for some large class of pathological patterns, it won’t match them very quickly, but neither will it explode in nearly the same way as more naïve engines. The downside is that the required bookkeeping imposes a performance hit on all matching, which makes Perl’s engine slower than that of languages with more naïve engines, like Java. tilly wrote a great, long article on this subject some time ago.

    Makeshifts last the longest.

Re^2: Perl regexp matching is slow??
by Anonymous Monk on Sep 27, 2012 at 21:57 UTC
    Thank you a thousand thousand times, grinder! Your note about using non-greedy regexes (seems obvious in hindsight) reduced some of my large text file parse script runtimes by roughly 70%. I was searching very generally for speeding up large regex operations and, as much as my eyes were starting to glaze over, you managed to clearly communicate a rather simple yet important concept without being obtuse or pedantic. If only more searches resulted in such education. :) /salute