I think you're missing the point of the post. Perhaps I should've given more details on the methods used but I thought that the wording in the questions "the most executions per second" shows that the snippet was evaluated using a timethis or cmpthese mode, generally in absence of any real data values. Also, see my response to Arguile (once I get time to reply).
While researching questions 1 & 2 I used both timethis and cmpthese methods in many different programs. I am well aware of the fact that using $POSTMATCH and $PREMATCH can taint the compilation (not just execution) of other regexen in the same program. This was pointed out by shotgunefx in a reply to Support for hash comments on a line. That premise was what sparked my interest in benchmarking. Not that I didn't believe shotgunefx, because he was echoing information also stated in Programming Perl. I wanted to know to what extent and see some metrics. BTW, a->fastolfe, b->me, c->buckaduck, d->blakem and e->demerphq.
The following program is what I used for the results of questions 1 & 2:
use strict;
use Benchmark;
my %snippet = (
a => 's/\s*#.*//',
b => '$_ = $` if /#/',
c => 'my ($line) = split/#/',
d => '$_ = substr($_, 0, $-[0]) if /#/',
e => 'if ((my $p = index($_, "#")) > -1) { substr($_, $p, -1, "") }
+'
);
if (exists $snippet{$ARGV[0]}) {
print "Processing $ARGV[0] snippet: '", $snippet{$ARGV[0]}, "'\n";
timethis (-10, $snippet{$ARGV[0]});
} else {
print "Missing or invalid parameter to single1.p\n\n";
}
The results table (note that since I was using cmpthis in separate programs I compiled the following table using an excel spreadsheet from the averaged results of three runs each):
rate c e a b d
c 521648/s -- -4% -61% -68% -68%
e 543110/s 4% -- -59% -66% -67%
a 1335220/s 156% 146% -- -17% -19%
b 1614068/s 209% 197% 21% -- -2%
d 1639269/s 214% 202% 23% 2% --
Also, I suppose I could have put that b was a correct response as well since I had specified a 3% error margin elsewhere, but then I probably ran the benchmarks from this program over 30 times with very similar results. I'll also add that these benchmarks were done after ending all tasks except for Explorer and Systray on Windows 98SE. This boosted the executions per second and lessened the margin of error between runs but did not change the final percentages much.
My experiments with using cmpthese instead:
use strict;
use Benchmark qw(cmpthese);
cmpthese (-10, {
a => 's/\s*#.*//',
b => '$_ = $` if /#/',
c => 'my ($line) = split/#/',
d => '$_ = substr($_, 0, $-[0]) if /#/',
e => 'if ((my $p = index($_, "#")) > -1) { substr($_, $p, -1, "") }
+'
});
Points out the fact that, had I used cmpthese instead of the timethis method, I could have skewed the results in my favor (b is my snippet), but of course knowing that someone might question the results I opted for the honest approach (not to say that I may have still erred unknowingly). Notice however in the three consecutive executions below that the only snippets significantly affected as far as questions 1 & 2 are concerned were mine and yours (d is blakems snippet). I originally had more questions based on those snippets but thought they were getting too tedious and omitted them. Yes a is slowed by about 3-4% (a is fastolfes snippet) but it is irrelevant to questions 1 & 2.
Rate c e a d b
c 518715/s -- -2% -62% -67% -68%
e 530337/s 2% -- -61% -66% -67%
a 1364925/s 163% 157% -- -14% -15%
d 1582165/s 205% 198% 16% -- -1%
b 1600284/s 209% 202% 17% 1% --
Rate c e a d b
c 508712/s -- -5% -63% -68% -69%
e 532810/s 5% -- -61% -66% -68%
a 1365789/s 168% 156% -- -14% -17%
d 1589576/s 212% 198% 16% -- -3%
b 1639949/s 222% 208% 20% 3% --
Rate c e a d b
c 509240/s -- -6% -63% -68% -68%
e 544492/s 7% -- -60% -66% -66%
a 1365411/s 168% 151% -- -15% -15%
d 1599654/s 214% 194% 17% -- -1%
b 1608742/s 216% 195% 18% 1% --
As for questions 3 & 4, I ran the '' snippet trying to determine some sort of baseline for the execution results and out of pure curiosity. For a one second timethis method the benchmark ran for about 5 minutes or so (I didn't time it by the wall clock) and then produced the error displayed. I interpret (maybe erroneously) it to mean that it was incrementing so quickly that it exceeded the integer range before the 1 second time interval elapsed. The same holds true for the ';' snippet since it is essentially the same as '' once compiled (an assumption). The '{}' snippet on the other hand completed in the 1 second time interval on my computer. Meaning that if someone runs the same benchmark on a faster CPU (an assumption) they will probably get the runtime error on all three snippets. On a slower CPU (again, an assumption), it is possible that all three snippets would complete without error.
I also ran some 'real world' benchmarks to determine which of the snippets in questions 1 & 2 ran the fastest on 20MB data sets and came up with interesting results, but as you can see this node is already overlong. If there are people interested in seeing those results, /msg me and I'll post them on my scratch pad. I don't want to bore everyone with tons of programs and metrics without knowing if there was enough interest to warrant it.
--Jim
|