Anyone willing to improve the benchmark?
Update: Added regexwise.
| [reply] [d/l] [select] |
Anyone willing to improve the benchmark?
They couldn't make it much worse :)
Incorporating one-off set-ups -- or even a test for one-off setups -- into the benchmark subs is like incorporating the build-time of a car in its race time.
If the substr code is meant to reflect my second option, you've completely misunderstood the purpose of the substr refs and assigning through a fixed scalar buffer.
Its also traditional to post the results of a typical run.
I may have a go at producing a more realistic benchmark later tonight. Key ingredients are that you must not exclude the IO, buffer and memory handling when benchmarking IO processing of large files. Yours excludes all of these.
Hint: You cannot do IO filter bechmarks using the Benchmark module. The only realistic test is to time processing actual files that are big enough that they do not fit into the filecache. And you must ensure that the cache is flushed between runs.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
1..4
ok 1 - array
ok 2 - substr
ok 3 - pack
ok 4 - regex
Rate arraywise regexwise packwise bitwise substr
+wise
arraywise 38.9/s -- -75% -86% -98% -
+100%
regexwise 153/s 294% -- -43% -93% -
+100%
packwise 270/s 593% 76% -- -87% -
+100%
bitwise 2077/s 5239% 1255% 671% --
+-99%
substrwise 187563/s 481985% 122274% 69480% 8930%
+ --
I guessed that my solution is as slow as I am... ;-)
McA | [reply] [d/l] |
BTW: There is a problem with your substrwise test.
The first time through, $mask is undefined, so you set up @mask and set smask.
But on the second and subsequent times through, $mask is defined, so the non-state variable: @mask is left empty, so you don't do any actual work.
That probably explains the surprising apparent efficiency of substrwise in the figures McA posted.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
| [reply] [d/l] [select] |
while $mask =~ /1+/g
to get the same results as from the other methods.
| [reply] [d/l] |
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
my @benches = (
sub {
printf 'unpack: ';
my $mask = shift;
my $templ;
while( $mask =~ /((.)\2*)/g ) {
$templ .= (qw(x a))[$2] . length $1;
}
return sub {
my $fh = shift;
my $count = 0;
my $out;
$out = join'', unpack( $templ, $_ ), ++$count while <$fh>;
$count;
}
},
sub {
printf 'substr: ';
my $mask = shift;
my $templ;
my @mask;
while ( $mask =~ /0+/g ) {
push @mask, [ $-[0], ( $+[0] - $-[0] ) ];
}
return sub {
my $fh = shift;
my $count = 0;
my $out;
while( defined( $out = <$fh> ) ) {
substr( $out, $mask[-$_][0], $mask[-$_][1],'' ) for 1
+.. @mask;
++$count;
}
$count;
}
},
sub {
printf 'substrref: ';
my $mask = shift;
my $templ;
my $buf = chr(0); $buf x= 400_000;
my @refs; push @refs, \substr( $buf, $-[0], $+[0] - $-[0] ) wh
+ile $mask =~ /0+/g;
return sub {
my $fh = shift;
my $count = 0;
my $out;
while( <$fh> ) {
substr( $buf, 0 ) = $_;
$out = join'', map $$_, @refs;
++$count;
}
$count;
}
},
sub {
printf "bitops: ";
my $mask = shift;
$mask =~ tr[01][\x00\xff];
return sub {
my $fh = shift;
my $count = 0;
$_ &= $mask, tr[\x00][]d, ++$count while <$fh>;
$count;
}
},
);
$|++;
our $OPT //= 0;
our $FLUSHFILE //= '10gb.csv';
our $TESTFILE //= '1023727.dat';
our $S //= 1;
srand $S;
my $mask = join '', map int( rand 2 ), 1 .. 400_000;
open I, '<', $FLUSHFILE or die $!;
1 while <I>;
close I;
my $start = time;
my $run = $benches[ $OPT ]->( $mask );
open I, '<', $TESTFILE or die $!;
my $records = $run->( \*I );
close I;
my $stop = time;
printf "Took %f seconds for %u records (%f recs/second)\n",
$stop - $start, $records, $records / ($stop - $start);
__END__
C:\test>for /l %n in (0,1,3) do @1023727 -OPT=%n
unpack: Took 164.702357 seconds for 2606 records (15.822482 recs/secon
+d)
substr: Took 2971.481218 seconds for 2606 records (0.877004 recs/secon
+d)
substrref: Took 154.501948 seconds for 2606 records (16.867101 recs/se
+cond)
bitops: Took 12.534998 seconds for 2606 records (207.897916 recs/secon
+d)
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
div class= | [reply] [d/l] |
| [reply] |
Thanks! So, could you unpack (har, har) the part where you do "x3a7x2a4x2a7", I am still not getting these templates. | [reply] |
| [reply] |