Re^2: The Deceiver

in reply to Re: The Deceiver
in thread Why does a Perl 5.6 regex run a lot slower on Perl 5.8?

Thank you for your quick answer, and thanks again for taking the time to explain. Although I agree that the two regular expressions have different meanings, the real question here is why Perl 5.6.1 is 500-1,000 times faster than Perl 5.8.0 on the same regular expression -- this is my real query. Am I to assume that Perl 5.6.1 did not properly parse certain regular expressions and Perl 5.8.0 now does? I just tried your regular expressions and they yielded the same results under both versions. How unstable is my previous code, if new versions can make it obsolete in performance, as if encouraging not to upgrade.

Comment on Re^2: The Deceiver

Replies are listed 'Best First'.
Re^3: The Deceiver by itub (Priest) on Aug 13, 2004 at 14:41 UTC
`#reg.pl $s = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyRRRRyyyy\n" x 500; $n = 0; $n++ while ($s =~ /(.*?)RRRR\1/sg); print "$n matches\n";` [download] time ~/bin/perl5.8.0 reg.pl 500 matches real 0m4.836s user 0m4.800s sys 0m0.010s time ~/bin/perl5.6.1 reg.pl 0 matches real 0m0.020s user 0m0.020s sys 0m0.000s So, in fact, you are complaining that a bug got fixed. The problem is that these are extremely inefficient regular expressions because they involve a lot of backtracking. I recommend reading Mastering Regular Expressions for a detailed explanation.	[reply] [d/l]
Re^4: The Deceiver by perldeveloper (Scribe) on Aug 13, 2004 at 14:55 UTC
That's very good to hear. What's not good is that code that relied on nothing like backreferencing regexps got squashed in the upgrade process. Like japhy guessed, the regexp failed to match, but wouldn't it make sense even for a `/(.*)TEXT\1/` regexp to first look for `/TEXT/` and then worry about getting the appropriate group match (be it greedy or reluctant)? This slowing down is a terrible shock some people might get (including me) when moving old code to new code. But on another note, I do agree I'm a long way from mastering regular expressions.	[reply] [d/l] [select]
Re^5: The Deceiver by tilly (Archbishop) on Aug 14, 2004 at 01:15 UTC
In a perfect world, code should be correct and fast. Perl's code used to be wrong and fast. Now it is correct and slow. Correct and slow is generally better than wrong and fast. In time it is likely to speed up again and we'll all be happy. Unfortunately you began with the rude shock of seeing an amazing slowdown. Therefore while in other circumstances you might agree that you want the right answer, anything below the speed which you were accustomed to is bad. On the specific optimization that you offer, you're right and wrong. You're right that you can optimize that one regular expression that way and it would be good for that regular expression. But it wouldn't speed up the one that you did want to run. Furthermore adding a check for that special case would slow down the compilation of every other regular expression out there (including the one that you wanted to run). Furthermore you've just added a code path that has potential bugs which might not get caught. This is not to say that you never want to speed up special cases - of course you do and the regular expression engine has a lot of special tricks. But you have to balance out what is sped up by any one trick against how it slows other people down and causes opportunities for bugs to lurk. That said, I'd like to point out why the optimization that you point out would not solve your problem. It would tell how to solve a particular expression that you weren't running. The one that you tried to run is different enough that the optimization would probably not run. What you actually would have benefited from is an optimization that says, "Check that there are no backreferences within the RE, then turn on the old special case optimizations." Which might or might not work out to be worthwhile. (And I do not wonder that japhy just chose to turn the optimization off rather than put a test that is that complicated in.)	[reply]
Re^5: The Deceiver by itub (Priest) on Aug 13, 2004 at 15:12 UTC
If I remove the backreference by changing the regex in my example to `$n++ while ($s =~ /(.*?)RRRR/sg);`, I get the following: time ~/bin/perl5.8.0 reg.pl 500 matches real 0m0.018s user 0m0.010s sys 0m0.010s time ~/bin/perl5.6.1 reg.pl 1 matches real 0m0.015s user 0m0.010s sys 0m0.000s So at least in this case Perl 5.8.0 doesn't have a speed problem. I don't know exactly what's going on in your code though.	[reply] [d/l]
Re^6: The Deceiver by itub (Priest) on Aug 13, 2004 at 15:27 UTC
Re^3: The Deceiver by japhy (Canon) on Aug 13, 2004 at 14:08 UTC
I'm not entirely sure why the regexes were so much slower, unless they just never could actually match. In that circumstance, `/.FAIL/` would be a lot slower than `/^.FAIL/`. _____________________________________________________ Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]
Re^4: The Deceiver by Jenda (Abbot) on Aug 17, 2004 at 22:05 UTC
And since perldeveloper removes the "whatever"s from the string the regexp will fail in the last iteration. So I would not be surprised if most of the wasted time was in the last iteration :-) Jenda Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. -- Rick Osborne	[reply]

In Section Meditations