Re: Idiomatic optimizations

Ever since I read in Mastering Regular Expressions that perl makes a copy of the base string when doing a case-insensitive match, I've tried to use character classes instead of /i.

... Before submitting this post, though, I decided to actually benchmark some variations to see whether character classes were faster. To my surprise, it turns out that /i is about 50% faster in the test I used:

use strict;
use Benchmark qw(cmpthese);

my $foo = "abcdefghijklmnopqrstuvwxyz"x500;
my $re = "[Aa][Bb][Cc]";

cmpthese(1000000, {
  'i'       => sub { $foo =~ /abc/ig },
  'chars'   => sub { $foo =~ /[Aa][Bb][Cc]/og },
  'charvar' => sub { $foo =~ /$re/og },
});
[download]

yielding these results on my machine:

Benchmark: timing 1000000 iterations of chars, charvar, i...
     chars:  2 wallclock secs ( 1.97 usr +  0.00 sys =  1.97 CPU) @ 50
+7614.21/s (n=1000000)
   charvar:  3 wallclock secs ( 2.04 usr + -0.01 sys =  2.03 CPU) @ 49
+2610.84/s (n=1000000)
         i:  1 wallclock secs ( 1.31 usr +  0.00 sys =  1.31 CPU) @ 76
+3358.78/s (n=1000000)
            Rate charvar   chars       i
charvar 492611/s      --     -3%    -35%
chars   507614/s      3%      --    -34%
i       763359/s     55%     50%      --
[download]

Results are similar for strings of various lengths. So was Mastering Regular Expressions incorrect, or has the problem just been fixed since it was written?

Comment on Re: Idiomatic optimizations Select or Download Code

Replies are listed 'Best First'.
Re: Re: Idiomatic optimizations by samtregar (Abbot) on Apr 30, 2002 at 07:59 UTC
The problem with //i isn't (wasn't?) that it's slower on small strings. It's that it uses twice the memory as an equivalent character class. And when you start matching against huge strings that can really make a difference. Try your example against a 50MB string and I think you'll see what I mean. If not you can justly castigate me for being too lazy to test my own assertions. Eagerly awaiting the second edition, -sam	[reply]
Re: Re: Re: Idiomatic optimizations by hakkr (Chaplain) on Apr 30, 2002 at 11:40 UTC
Always pass referances not data structures hence \ operator is an optimistaion sub(\@array) instead of sub(@array) Only use what you need from modules `use CGI qw(:standard);` Also I like shortcut operators `my $i \|\|=0 ; my $i =shift \|\| 0;` [download] also I like ? operator instead of if's `$i?$i=1:$i=0;` [download] is !~ an optimisation over just negating the result of =~, I dunno but I think !~ looks better	[reply] [d/l] [select]
Re: Re: Re: Re: Idiomatic optimizations by Joost (Canon) on May 01, 2002 at 08:29 UTC
Just a few remarks (though i've got a sneaking suspicion i'm correcting typos here): `> my $i \|\|=0 ;` [download] here $i will always turn out to be 0 (because of the my operator), so `my $i=0;` is more efficient. `> $i?$i=1:$i=0;` [download] How about `$i=$i?1:0;` - that's also a bit more readable. (at least to my eyes). Joost.	[reply] [d/l] [select]
Re^4: Idiomatic optimizations by tadman (Prior) on May 01, 2002 at 08:37 UTC
Don't forget that ?: can get dangerous, not unlike juggling running chainsaws. It's a great show, but is liable to injure yourself something fierce: `$foo = $a? $b? $c : $d? $e : $f : $g : $h;` Sometimes an if is more verbose, but undeniably precise. Instead of getting carried away with ?:, you can sometimes compact it using the regular logical operators `\|\|` and `&&`. It really depends on what you're working with.	[reply] [d/l] [select]
Re: Re^4: Idiomatic optimizations by demerphq (Chancellor) on May 02, 2002 at 13:13 UTC
Re: Re: Re: Re: Idiomatic optimizations by Juerd (Abbot) on May 01, 2002 at 16:44 UTC
$i?$i=1:$i=0; Puh-lease, use some whitespace! Here are some alternatives: `$i ? $i = 1 : $i = 0; $i = $i ? 1 : 0; $i = !!$i \|\| 0; $foo = $foo ? 1 : 0; # Single-letter variable names: # easy to type, hard to read` [download] Always pass referances not data structures Most references are the root of data structures, so I think you meant "Always pass references instead of flattened hashes or lists". Note that you can't use this if the sub in question doesn't expect it. - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply] [d/l]
Re: Re: Re: Idiomatic optimizations by thelenm (Vicar) on Apr 30, 2002 at 16:59 UTC
Nah, no castigation here. When I whipped up my test, I had thought that the alphabet x 500 was a pretty big string, but now that I'm thinking clearly that's not very big at all. To test out a really big string, I replicated Romeo and Juliet 500 times, read the whole thing into a string, then ran ~~the same regular expressions~~ almost the same regular expressions. I removed /o from the 'chars' sub, which actually made it a little faster. The string was about 70 MB. Here is my new test code: `use strict; use Benchmark qw(cmpthese); local $/ = undef; open IN, "romeo-and-juliet-500-times.txt"; my $text = <IN>; close IN; # Ten iterations is enough with a 70 MB string! cmpthese(10, { 'i' => sub { $text =~ /abc/ig }, 'chars' => sub { $text =~ /[Aa][Bb][Cc]/g }, });` [download] To my surprise (again!), the /i version ran in about 1/3 the time as the character-class version. Here is the output on my machine: `Benchmark: timing 10 iterations of chars, i... chars: 40 wallclock secs (38.37 usr + 0.04 sys = 38.41 CPU) @ 0 +.26/s (n=10) i: 12 wallclock secs (11.43 usr + 0.01 sys = 11.44 CPU) @ 0 +.87/s (n=10) s/iter chars i chars 3.84 -- -70% i 1.14 236% --` [download] I'm amazed. Am I not testing the right thing? Or has /i really been cleaned up in recent versions of Perl? I'm running 5.6.1.	[reply] [d/l] [select]


There's more than one way to do things
	PerlMonks